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SEQUENTIAL  QUADRATIC  PROGRAMMING 
ALGORITHMS 
FOR  OPTIMIZATION 


Francisco  Javier  Prieto,  Ph.D. 
Stanford  University,  1989 


The  problem  considered  in  this  dissertation  is  that  of  finding  local  min- 
imizers  for  a  function  subject  to  general  nonlinear  inequality  constraints, 
when  first  and  perhaps  second  derivatives  are  available.  The  methods  stud¬ 
ied  belong  to  the  class  of  sequential  quadratic  programming  (SQP)  algo¬ 
rithms.  In  particular,  the  methods  are  based  on  the  SQP  algorithm  embod¬ 
ied  in  f he  cod-  NPSOL.  which  was  developed  at  the  Systems  Optimization 
Laboratory,  Stanford  University. 

The  goal  of  the  dissertation  is  to  develop  SQP  algorithms  that  allow 
some  flexibility  in  their  design.  Specifically,  we  are  interested  in  introduc¬ 
ing  modifications  that  enable  the  algorithms  to  solve  large-scale  problems 
efficiently.  The  following  issues  are  considered  in  detail: 

•  The  use  of  approximate  solutions  for  the  QP  subproblem.  Instead  of 
trying  to  obtain  the  search  direction  as  a  minimxzer  for  the  QP,  the 
solution  process  is  terminated  after  a  limited  number  of  iterations. 
Suitable  termination  criteria  are  defined  that  ensure  convergence  for  an 
algorithm  that  uses  a  quasi-Newton  approximation  for  the  full  Hessian. 
Theorems  concerning  the  rate  of  convergence  are  also  given. 

•  The  use  of  approximations  for  the  reduced  Hessian  m  ttie  construction 

of  the  QP  subproblems.  For  many  problems  the  reduced  Hessian  is 
considerably  smaller  than  the  full  Hessian.  Consequently,  there  are 
considerable  practical  benefits  to  be  gained  by  only  requiring  an  ap¬ 
proximation  to  the  reduced  Hessian.  Theorems  are  proved  concerning 
the  convergence  and  rate  of  convergence  for  an  algorithm  that  uses  a 
quasi-Newton  approximation  for  the  reduced  Hessian  when  early  ter¬ 
mination  of  the  QP  subproblem  is  enforced.  -  - •- — - 
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•  The  use  of  Find  second  d<  nvutve s.  The  use  of  second  derivatives, 
while  having  significant  practical  advantages,  introduces  new  difficul¬ 
ties:  for  example,  the  QP  subproblems  may  be  non  convex,  ard  even  ± 
miniinuer  for  the  subproblem  is  no  longer  guaranteed  to  yield  a  suit¬ 
able  search  direction.  It  is  shown  how  to  construct  suitable  search 
directions  from  approximate  solutions  to  the  QP  subproblem.  Also, 
theorems  are  proved  for  the  convergence  and  rate  of  convergence  of 
those  algorithms. 

Finally,  some  numerical  results,  obtained  from  a  modification  of  the  rode 
NPSOL.  are  presented. 
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Preface 


"The  ".hole  of  science  is  nothing  more  than  a 
refinement  of  everyday  thinking." 

—  Albert  Einstein 

The  last  forty  years  have  seen  the  introduction  of  numerous  methods  for 
the  solution  of  general  nonlinear  programs,  and  an  expansion  on  their  use 
as  satisfactory  mathematical  models  for  problems  in  many  different  fields 
of  human  activity.  Examples  of  this  use  can  be  found  in  areas  as  diverse 
as  general  equilibrium  models  in  economic  theory,  structural  optimization 
in  mechanical  engineering,  microeconomic  models  of  the  firm  in  business 
administration,  or  optimal  power  flow  in  electrical  engineering,  attesting 
both  to  the  universality  with  which  the  structure  of  the  mathematical  model 
ran  be  recognized  in  Nature,  and  also  to  the  existence  of  efficient  methods 
to  obtain  accurate  and  satisfactory  answers  to  the  problems  considered. 

Despite  the  fact  that  the  widespread  use  of  these  models  would  not  have 
been  possible  without  the  existence  of  efficient  solution  aigorithms.  the  opin¬ 
ion  is  frequently  expressed  among  researchers  in  the  field  that  no  general- 
purpose  algorithm  available  at  this  time  combines  all  the  desirable  features, 
and  in  particular,  that  the  algorithms  available  are  limited  regarding  either 
the  size  or  the  difficulty  of  the  problems  they  can  solve. 

The  search  for  more  reliable  and  faster  algorithms  constitutes  the  basic 
motivation  for  the  work  presented  in  this  dissertation.  It  would  have  been 
presumptuous  to  have  set  as  a  goal  the  seaich  for  answers  to  all  the  unan¬ 
swered  questions  left  in  this  field;  it  has  been  our  objective  simply  to  explore 
some  aspects  promising  improvements  for  aigorithms  oriented  towards  the 
solution  of  large-scale  problems,  on  the  understanding  that  it  is  in  this  area 
where  a  more  substantial  amount  of  work  seems  left  to  be  done.  In  any 
event,  it  is  our  hope  that  the  exploration  of  these  topics,  independent  of  the 
setting  in  which  they  have  been  studied,  may  help  to  shed  some  light  on 
issues  of  general  interest  in  the  field. 

The  work  presented  in  this  dissertation  would  not  have  been  possible 
without  the  financial  assistance  provided  by  the  Bank  of  Spain,  and  the 
f a r l i r  results,  generous  support  and  assistance  of  the  SOI.  algorithms  group 
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at  Stanford  University.  Special  mention  is  deserving  of  my  advisor.  Prof. 
Walter  Murray,  who  not  only  suggested  the  main  ideas  explored  in  this 
dissertation  and  guided  the  course  of  the  work  to  its  present  state,  but 
also  found  the  time  for  many  enlightening  conversations  on  the  most  diverse 
topics.  Profs.  Philip  Gill  and  Michael  Saunders  were  always  willing  to  answer 
my  many  questions,  and  provided  comments  and  suggestions  from  which 
this  work  has  benefited  greatly;  the  example  of  their  behavior  (and  that  of 
my  advisor)  has  been  one  of  my  most  important  lessons  during  this  period. 
Although  I  had  little  opportunity  to  benefit  from  her  presence,  Dr.  Margaret 
Weight  will  be  fondly  remembered  for  her  energy  and  dedication. 

I  am  indebted  to  Prof.  George  B.  Dantzig  for  his  generous  invitation 
to  visit  this  department  during  the  summer  of  1983;  this  work  is  one  of 
its  consequences.  It  lias  been  a  privilege  to  have  him  in  my  dissertation 
committee. 

I  would  like  to  express  my  gratitude  to  the  students  working  with  the 
SOL  group,  Samuel  Elder:.-  eld.  Anders  Forsgren.  Aeneas  Marxen  and  Duke 
P'mceleon.  for  providing  a  very  pleasant  and  stimulating  atmosphere.  Spe¬ 
cial  thanks  must  be  given  to  Anders  Forsgren  for  his  invaluable  comments 
and  suggestions.  I  am  also  deeply  grateful  to  Dr.  Ulf  Ringertz  for  his  many- 
intelligent  remarks,  and  for  having  provided  the  code  for  the  structural  op¬ 
timization  test  problems. 

Finally.  1  would  like  to  thank  the  faculty  members,  staff  and  students  at 
the  Department  of  Operations  Research,  who  helped  in  many  different  ways 
to  make  this  a  productive  and  enjoyable  experience. 


F.J.  Prieto 
Stanford,  1989 
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Chapter  1 

Introduction 

In  this  clia|)t(',  we  introduce  the  subject  of  the  report,  and  give  some  motivation  for  the 
research  u  ndert  a  ken.  In  addition,  a  brief  summary  of  previous  work  in  this  area  is  presented. 

1.1.  The  problem  and  algorithms 

This  report  is  concerned  with  issues  in  the  held  of  nonlinear  programming,  which  in  its  most 
general  form  is  that  of  finding  extreme  points  (minimizers  or  maximizers)  for  a  univariate 
function,  subject  to  certain  conditions  on  the  acceptable  values  for  the  variables. 

For  the  purpose  of  this  work,  the*  problem  is  assumed  to  take  a  more  restricted  form. 
The  effort  is  limited  to  the  determination  of  local  ext  rente  points,  and  the  conditions  on  the 
values  of  the  variables  are  assumed  to  be  given  by  a  system  ol  nonlinear  inequalities.  The 
nonlinear  program  considered  takes  the  following  form: 

NLP 

where  F  :  hv‘  -  tf  and  r  :  J?n  -  -  h'm. 

I  he  most  reliable  algorithms  for  solvin.  .his  problem  make  use  of  the  derivatives  of  the 
functions  defining  the  problem,  when  they  exist.  In  Ibis  spirit,  the  algorithms  to  he  studied 
try  to  exploit  the  structure  of  the  problem  by  constructing  local  approximations  from  the 
derivaiive  information  available.  Phis  requires  additional  conditions  on  the  form  of  the 
problem;  the  basic  assumption  is  the  twice  continuous  differentiability  of  the  functions  F 
and  r.  In  addition,  some  other  assumptions  of  a  more  terhnii  1  nature  are  required;  these 
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assumptions  will  ho  specified  later. 

SQP  algorithms 

It  is  not  known  in  general  how  to  compute  a  so’ution  of  the  nonlinear  program  N 1. 1*  in 
a  finite  number  of  iterations  (obvious  exceptions  being  the  cases  of  linear  and  cpiadratic 
programming),  and  so  the  algorithms  developed  for  its  solution  are  sequential  in  nature, 
that  is.  an  infinite  sequence  of  points  {-r<.-}7i0  is  generated,  such  that  the  limit  points  of 
convergent  subsequences  are  solutions  tor  the  problem. 

Among  sequential  algorithms  a  particular  class,  that  of  sequential  cpiadratic  program¬ 
ming  (SQP)  algorithms,  seems  to  be  regarded  as  the  best  choice  for  the  solution  of  small, 
dense  problems  (see  Stoer  [StoKo]  or  Gill  of  a/.  [CJMSWSS],  for  example).  The  algorithms 
considered  belong  to  this  family  of  SQP  algorithms,  and  the  concern  of  our  research  is  to 
extend  the  class  of  problems  for  which  these'  algorithms  may  be  an  efficient  choice. 

The  next  paragraphs  are  devoted  to  commenting  upon  some  of  the  features  of  SQP 
algorithms,  and  their  relevance  to  this  work.  We  start  by  describing  the  most  general  form 
that  such  an  algorithm  may  take*. 

•  The  algorithm  generates  a  sequence  of  points  { j**.- }  converging  to  a  solution. 

•  At  each  point,  x^.  a  linearly  constrained  quadratic  program  (QP)  approximating 
locally  the  NLP  problem  is  generated,  and  a  direction  pk  is  obtained  from  it. 

•  The  next  point  is  defined  to  be  cither  +  />*.  or  the  result  of  a  linesearrh  from  j 

along  i>k .  in  such  a  way  that  the  value  of  a  certain  merit  function  is  decreased. 

We  are  not  concerned  with  the  study  of  a  general  class  of  algorithms,  like  the  one 
described  above,  but  rather  with  the  definition  and  study  of  specific  algorithms  within  this 
class.  Although  the  particular  forms  of  these  algorithms  are  presented  in  the  following 
chapters,  w<  point  out  here  that  their  most  significant,  characteristics  arc  the  use  of  a 
linesearrh  to  determi'  nex„  point  in  the  sequence,  and  the  construction  of  cpiadratic 

subproblems  of  t  hi  '  . 
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for  some  matrix  who.-,*'  properties  arc  described  as  part  of  the  definition  of  the  different 
algorit h m>  considered. 


Goal  of  the  report 

Kxpanding  upon  previous  remarks.  tins  report  is  specially  concerned  with  modifications  to 
the  wav  that  QP  approximations  are  constructed  and  solved.  The  modifications  considered 
are  oriented  towards  defining  more  flexible  SQP  algorithms  in  order  to  make  them  more 
suitable  for  the  solution  of  large-scale  problems.  Specifically,  we  wish  to  relax  the  usual 
assumption  that  the  search  direction  is  obtained  as  a  minimi/, er  of  the  QP  snbproblem.  and 
also  to  allow  the  use  of  exact  second  derivatives,  or  to  require  only  an  approximation  to  the 
reduced  Hessian  Finally,  it  may  be  possible  to  take  advantage  of  the  increased  flexibility 
to  improve  the  performance  of  SQP  methods  even  on  small  dense  problems. 

Incomplete  QP  solution 

Throughout,  we  develop  algorithms  that  obtain  lit*'  search  direction  for  a  quadratic  sub- 
problem  in  a  limited  number  of  iterations,  which  often  in  practice  is  significantly  smaller 
than  the  number  required  for  the  computation  of  a  minimixer  for  the  QP  snbproblem;  the 
search  direction  obtained  in  this  form  will  bo  referred  to  as  an  incomplete  QP  solution.  In 
general,  the  algorithm  moves  from  a  starting  point  satisfying  certain  mild  conditions  to  the 
first  stationary  point,  and  the  search  direction  is  constructed  from  the  information  known 
at  that  point. 

The  QP  subproblems  generated  in  the  algorithms  developed  so  far  have  been  normally 
obtained  by  using  quasi-Newton  approximations  to  the  full  or  the  reduced  Hessian;  we  shall 
also  consider  the  option  of  using  the  exact  Hessian  in  the  definition  of  //^.. 

Quasi-Newton  approximations  generate  matrices  that  are  positive  definite,  and  at  the 
same  time  allow  t  lie  condition  numbers  of  the  approximating  matrices  to  be  controlled.  In 
'his  way.  a  convex  snbproblem  is  obtained,  and  if  it  is  feasible,  its  solution  exists  and  is 
unique.  In  contrast,  the  use  of  exact  Hessians  leads  to  noil-convex  subproblems;  moreover, 
//r  may  now  be  singular.  On  tlu>  other  band,  it  will  be  seen  that  I  he  use  of  the  exact 
Hessian  loads  to  stronger  convergence  results  and  an  improved  rati'  of  convergence. 
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Convergence  assumptions 

'['lie  convergence  of  the  algorithms  in  this  family  normally  requires  additional  conditions  on 
the  form  of  the  problem.  An  aim  that  underlies  all  the  work  presented  in  this  report  is  to 
try  to  develop  algorithms  whose  convergence  proofs  make  use  of  a  reasonably  weak  set  of 
assumptions.  The  ones  that  can  be  most  frequently  found  in  the  literature  are: 

•  existence  and  continuity  of  second  derivatives  for  the  objective  and  constraint  func¬ 
tions; 

•  full-rank  Jacobians  at  solutions  of  the  problem; 

•  bounded  (above  and  below)  eigenvalues  for  the  approximations  to  the  Hessian  of  the 
Lagrangian  function; 

•  strict  complementarity  at  solutions  of  the  problem; 

•  existence  of  a  feasible  point  for  each  subproblem; 

•  compactness  of  the  feasible  region,  or  of  the  region  where  the  iterates  lie. 

The  search  direction 

Together  with  these  “regularity"  assumptions  on  the  form  of  the  problem,  it  is  necessary  to 
specify  the  form  of  the  direction  of  movement  obtained  from  the  QP  subproblem,  and  that 
of  the  multiplier  estimates.  In  the  literature,  the  usual  choices  have  been: 

•  the  direction  of  movement  is  obtained  as  the  exact  solution  of  the  QP  subproblem, 
constructed  Uo  a  convex  program; 

•  the  multiplier  estimates  to  be  used  are  either  the  QP  multipliers  at  the  last  minimizer 
obtained,  or  the  least-squares  multipliers  at  the  current  point. 

Details  about  these  choices  are  given  in  the  next  section. 

Defining  a  solution 

In  the  previous  paragraphs  several  references  have  been  made  to  solutions  of  the  NLP 
problem.  The  following  remarks  try  to  clarify  what  is  understood  by  a  solution. 
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Local  solutions  can  be  characterized  in  terms  of  what  are  known  as  the  Karush-Kulm- 
Tucker  (KKT'i  conditions  (set'  for  example  Fiacco  and  McCormick  [FMCbX]  or  Ciill  ef  a/. 
[GMW’Xl]),  given  in  terms  of  the  first  and  second  derivatives  of  the  I.agrangian  function 
for  the  problem.  The  conditions  come  in  different  forms,  and  in  particular  there  are  sets 
of  necessary  conditions,  and  sets  of  sufficient  conditions,  but  there  is  no  practical  necessary 
and  sufficient  characterization  of  this  form  for  the  general  case.  Given  that  the  previous 
algorithms  obtain  points  that  satisfy  the  necessary  conditions  on  the  first  and  second  deriva¬ 
tives.  it  is  not  possible  to  guarantee  that  the  points  obtained  correspond  to  solutions  of  the 
problem,  unless  additional  assumptions  are  satisfied. 

Also,  given  that  no  convexity  assumption  is  made  on  the  functions  defining  the  problem, 
no  a  priori  relationship  can  be  established  between  local  solutions  and  global  solutions:  this 
implies  that  the  algorithms  to  be  presented  will  not  normally  be  able  to  determine  whether 
the  solutions  obtained  are  global  solutions. 

The  following  terms  will  be  used  to  define  what  solution  points  the  algorithms  are  able 
to  find. 

•  Stationary  point.  A  feasible  point  .r  such  that 

V  )  =  Xc{ x  )7  A* .  A* c,( x )  =  0  i=l . in 

for  some  multiplier  vector  A*  £  K'”‘. 

•  hirst -onh  r  hhi  point.  A  stationary  point  x  such  that  A*  >  0. 

•  Sccond-ordc  r  hh  I  point.  A  first-order  KKT  point  x  such  that,  if  A  denotes  the  rows 
of  the  Jacobian  Vc(.r)  corresponding  to  the  constraints  having  positive  multipliers  at 
x. 

Vr  e  A'(.-l)  vTXrrlA  ,r.  A*  )r  >  0, 

where  the  I.agrangian  function  L  is  defined  as 

/,( x.  A)  =  F{x)  -  A  fc(x). 

and  VJ-J./,(.r.A)  denotes  the  Hessian  of  the  I.agrangian  function,  when  the  (partial) 
derivatives  are  taken  only  with  respect  1o  the  variable  .r. 

In  the  case  when  analytical  second  derivatives  are  unknown  or  directions  of  negative 
curvature  are  not  computed,  the  algorithms  to  be  presented  only  guarantee  that  a  solution 
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is  a  first-order  Kki  point.  VVhon  exact  Hessians  are  known  and  directions  of  negative 
curvature  are  determined  and  used,  the  solution  obtained  by  the  algorithm  will  be  a  second- 
order  KK  I  point. 

1.2.  Historical  background 

This  section  presents  a  brief  history  of  the  evolution  of  SQP  algorithms.  Surveys  for  this 
area  can  be  found  in  [C1MW81],  [Po83j  or  [GMSVVKK],  for  example. 

The  origins 

Die  earliest  reference  found  to  methods  of  this  family  is  Wilson’s  doctoral  dissertation 
[Wilfi.lJ.  His  algorithm,  formulated  for  the  special  case  of  convex  problems,  solved  an 
inequality  constrained  quadratic  subproblem  in  each  iteration,  formulated  using  the  exact 
Hessian  of  tin*  I.agrangian  function,  and  obtained  the  next  iterate  as  x k  +  Pk  (no  linesearch 
was  performed). 

In  general,  a  method  of  this  form  will  not  be  globally  convergent  unless  some  precautions 
are  taken  in  accepting  the  next  step.  Murray  [Mu69]  suggested  a  similar  algorithm,  but  now 
a  linesearch  was  performed  on  the  Q  merit  function,  to  guarantee  global  convergence.  Also, 
quasi- Newton  approximations  to  the  Hessian  of  the  I.agrangian  function  could  be  used  in 
the  generation  of  the  subproblem,  relaxing  the  requirement  of  convexity  for  the  problem. 

SQP  algorithms  became  popular  through  the  work  of  Biggs  [Big72],  Han  [Han76]  and 
Powell  [Po7K]  (in  the  literature  SQP  methods  are  sometimes  referred  to  as  Wilson-Han- 
Powell  algorithms).  Biggs  proposed  an  algorithm  similar  to  the  one  in  [Mu69],  with  the 
difference  that  the  quadratic  subproblem  had  only  equality  constraints,  and  a  term  for  the 
multiplier  estimate  had  been  added  to  the  constraints. 

"Phe  algorithm  proposed  by  Han  solved  an  inequality  constrained  QP  subproblem,  where 
the  Hessian  was  given  by  a  quasi-Newton  approximation  to  the  Hessian  of  the  I.agrangian 
function,  although  it  required  the  assumption  that  the  Hessian  was  positive  definite  on  the 
whole  space.  Also,  the  “exact”  (or  Q )  penalty  function 

F(s,p)  =  F{x)  +  pY.,  max ((),  -c,(i)) 

was  used  as  a  merit  function  within  t ho  linesearch. 
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Powi'll  proposed  a  method  similar  to  the  one  in  [Han7(i],  but  lit*  was  able  to  show  that 
the  algorithm  converged  superiinearly  even  when  the  Hessian  of  the  Lagrangian  function 
was  indefinite  at  the  solution. 

In  the  next  paragraphs  we  focus  on  the  evolution  of  the  different  elements  of  an  SQP 
algorithm:  the  merit  function,  second-order  information,  the  multiplier  estimate,  etc. 

The  merit  function 

In  all  nonlinearly  constrained  optimization  algorithms  the  choice  of  the  merit  function  is  of 
great  importance,  not  only  because  of  its  role  in  enforcing  global  convergence,  but  also  in 
order  to  ensure  a  satisfactory  performance  of  the  algorithm. 

1  he  l]  (exact  penalty)  merit  function  has  become  a  very  popular  choice  after  being 
proposed  by  llan  [Han7l)]  and  Powell  [1\>7>>]  for  SQP  algorithms.  Its  advantage  is  that 
for  largo  enough  values  of  the  penalty  parameter,  minimizers  for  the  NLP  pioblem  are 
unconstrained  minimizers  for  the  exact  penalty  function.  On  the  other  hand,  the  function 
is  not  smooth,  and  in  particular  it  is  not  differentiable  at  the  solution  of  the  problem. 

Another  option  is  the  use  of  tee  augmented  Lagrangian 

I. ( .r .  A,  f> )  =  /■’(  ,r )  -  A 7 r(  x  )  +  r ;pc(  x  )7c(r) 

as  tht'  merit  lunction.  It  must  be  noted  that  this  function  includes  an  additional  set  of 
variables,  the  Lagrange  multiplier  estimates  A.  In  order  to  compute  the  correct  value  of  the 
original  variables  x,  it  is  necessary  to  obtain  the  correct  value  for  the  multiplier  estimate. 
In  fact,  this  merit  function  has  the  property  that,  if  the  optimal  multiplier  vector  is  used, 
there  exists  a  finite  value  of  the  parameter  p  such  that  the  solution  of  the  problem  is  an 
unconstrained  minimizer  of  the  merit  function. 

A  property  of  this  merit  function  is  that  it  is  smooth.  In  extensive  tests,  the  performance 
of  algorit  hms  using  this  merit  function  has  been  superior  to  that  of  methods  using  the  exact 
penalty  function.  On  the  other  hand,  any  algorithm  that  makes  use  of  this  merit  function 
needs  to  take  special  care  of  the  way  the  multipliers  are  estimated;  a  bad  estimate  may 
inhibit  convergence  or  degrade  the  performance  of  the  method.  The  theoretical  analysis  of 
these  algorithm.-*  is  also  more  complex  because  the  additional  variables  A  need  to  be  taken 
into  account.  I  lie  use  of  this  merit  function  in  an  SQP  framework  was  first  suggested  by 
Wright  [Wri7bj  and  Scbit t kowski  [ScliH  1  ] . 
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The  search  direction 

An  important  element  of  the  algorithms  presented  in  this  report  is  the  use  of  an  incomplete 
solution  of  the  QP  suhproblem  as  the  search  direction  for  the  merit  function. 

In  the  large-scale  case,  the  number  of  QP  steps  required  to  obtain  a  minimizer  for  the 
QP  subproblems,  particularly  in  the  early  iterations,  may  be  very  high.  Regardless  of  the 
inefficiency  this  111a;  introduce,  practical  implementations  must  impose  a  strict  upper  limit 
on  the  number  of  OP  steps.  There  is  therefore  a  definite  interest  in  defining  an  incomplete 
solution  whose  computation  requires  a  strictly  limited  number  of  steps. 

Although  thoie  have  been  proposals  in  the  literature  to  terminate  the  solution  process  for 
the  QP  subproblems  early,  the  great  majority  of  SQP  algorithms,  including  those  mentioned 
earlier  in  this  section,  define  the  search  direction  from  a  minimizer  for  the  QP  subproblem. 

An  approach  solving  QP  subproblems  inexactly  is  described  in  Dembo  and  Tulowitzki 
[DT85],  where  for  a  generic  SQP  algorithm  an  early  termination  rule  is  given  in  terms  of 
the  norm  of  the  reduced  gradient  for  the  subproblem.  This  rule  gives  a  search  direction  pk 
satisfying  the  condition 

l\pk  ~  /'III  =  °(!!/nll)- 

where  p*k  denotes  the  minimizer  for  the  kih  QP  subproblem. 

We  follow  a  different  approach,  presenting  an  early  termination  rule  that  is  constructive 
in  nature,  and  that  has  a  guaranteed  bound  on  the  effort  necessary  to  satisfy  it. 

The  multiplier  estimate 

An  important  aspect  in  the  efficient  implementation  of  methods  using  merit  functions  based 
on  the  Lagrangian  function  is  how  to  select  the  approximation  to  the  Lagrange  multipliers 
A  in  each  iteration. 

Most  SQP  algorithms  (for  example,  [flan 76]  or  [Po78])  define  A  as  ir ,  the  QP  multiplier 
obtained  at  the  solution  of  the  previous  subproblem:  A k  +  \  —  wk,  where 

VF(Xk)+  H  kPk  =  Vc(x/t)77Tfc, 

~l{vr(xk)pk  +  c(j>))  =  0. 

TT/t  >  0. 


Unfortunately,  in  this  case  the  change  in  the  Lagrangian  function  is  no  longer  monotonic 
whenever  the  multiplier  estimate  is  updated. 
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An  alternative  is  to  use  the  least-squares  multiplier  estimate  XL, 


Q( J-k)  =  (Ve(rfc)Ve(j:fc)7)  Vc(xk)VF(xk) 


and  to  treat  it  as  a  function  of  x.  rather  than  as  an  additional  variable,  simplifying  the 
theoretical  analysis  of  the  algorithm.  This  idea  appears  to  have  been  first  introduced  by 
Fletcher  [Fle70],  where  it  was  used  to  construct  an  augmented  Lagrangian  merit  function 
in  order  to  solve  an  equality-constrained  problem.  For  problem  NLP  with  only  equality 
constraints,  Powell  and  Yuan  [PY86]  have  considered  the  use  of  an  augmented  Lagrangian 
merit  function  that  estimates  the  multipliers  by  At,  and  they  have  shown  several  global  and 
local  convergence  properties  for  this  function. 

Another  option,  compatible  with  the  use  of  the  QP  multipliers  from  the  previous  iter¬ 
ation.  is  to  treat  the  multiplier  estimate  as  an  additional  set  of  variables  in  the  linesearch. 
This  idea  was  suggested  by  Tapia  [Tap77]  for  equality  constrained  optimization,  and  Schit- 
tkowski  [Sch.81]  introduced  it  in  an  SQP  framework.  A  proof  that  the  sequence  {xj.}  con¬ 
verges  to  a  first-order  KKT  point  and  the  multiplier  estimates  converge  to  A*  is  given  in 
Gill  et  hI.  [CMSW86b]. 


Trust-region  methods 

An  alternative  to  the  use  of  a  linesearch  on  a  merit  function  to  ensure  global  convergence 
is  the  trust-region  approach,  where  the  size  of  the  step  is  limited  by  imposing  a  constraint 
on  the  norm  of  the  solution  for  the  QP  subproblem. 

In  this  framework,  Fletcher  [Fle85]  proposed  an  algorithm  that  solved  a  quadratic  sub¬ 
problem  minimizing  the  Lagrangian  function  for  the  QP  subproblem,  subject  to  a  bound 
on  the  [|  .  ||,x,  norm  of  the  solution. 

Another  application  of  this  idea  is  given  by  Celis,  Dennis  and  Tapia  [CDT85]  for  the  case 
when  only  equality  constraints  are  present.  Their  algorithm  is  related  to  the  conventional 
trust-region  approach  in  unconstrained  optimization,  in  the  sense  that  they  impose  a  bound 
on  the  value  of  the  || .  j|2  norm  of  the  solution.  Also,  the  linearized  constraints  are  replaced 
by  a  second  bound  on  the  norm  of  their  violation. 

Fhe  algorithms  we  consider  make  use  of  a  linesearch,  and  trust-region  constraints  are 
not  specifically  included  in  the  QP  subproblems. 
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Second  derivative  information 

Several  alternatives  have  been  considered  in  the  literature  for  the  construction  of  the  matrix 
Hk  containing  the  second-order  information  for  the  quadratic  subproblem. 

It  was  mentioned  earlier  that  in  the  first  SQP  algorithm  proposed,  Hk  was  taken  to  be 
the  Hessian  of  the  I.agrangian  function  at  the  current  iterate.  When  the  NLP  problem  is 
convex,  there  are  no  special  difficulties  in  solving  the  subproblem. 

If  the  convexity  assumption  is  not  satisfied,  as  is  often  the  case  in  practice,  the  sub- 
problem  can  become  much  more  difficult  to  solve.  To  avoid  this  risk,  and  to  extend  the 
algorithm  to  cases  where  analytic  derivatives  may  not  be  available,  the  most  frequent  choice 
of  II k  has  been  the  use  of  a  positive  definite  quasi-Newton  approximation  to  the  full  Hes¬ 
sian  of  the  Lagrungian  function.  In  this  way,  a  convex  subproblem  is  still  obtained,  and 
the  subproblems  can  be  solved  efficiently.  A  detailed  discussion  of  quasi-Newton  updates 
can  be  found,  for  example,  in  Dennis  and  More  [DM77]  and  Dennis  and  Schnabel  [DS83]. 
Also,  a  description  of  different  approaches  to  the  implementation  of  this  idea  in  an  SQP 
framework  is  presented  in  Gurwitz  [Gur87]. 

A  difficulty  with  this  scheme  is  that  the  Hessian  of  the  Lagrangian  function  is  rarely 
positive  definite  on  the  whole  space  (even  at  a  solution).  It  is  likely  therefore  that  the  use 
of  quasi-Newton  updates  such  as  the  BFGS  method,  will  lead  to  indefinite  approximations. 
Several  alternatives  have  been  proposed  to  compensate  for  this  problem.  Powell  [Po78] 
presented  a  modification  of  BFGS  for  which  positive  definiteness  was  preserved  and  two-step 
superlinear  convergence  was  achieved.  Another  possibility  is  to  approximate  the  Hessian  of 
the  augmented  Lagrangian  function,  where  the  penalty  parameter  has  been  selected  large 
enough  so  that  the  Hessian  can  be  kept  positive  definite;  see  Biggs  [Big72],  Tapia  [Tap77] 
and  Han  [Han77j. 

Following  the  development  of  efficient  QP  solvers  for  indefinite  problems,  some  updating 
methods  have  recently  been  proposed  for  which  only  the  positive  definiteness  of  Z^HkZk 
is  preserved,  where  Zk  denotes  a  basis  for  the  null  space  of  the  Jacobian  of  the  active  con¬ 
straints  at  Xk-  The  motivation  for  these  approaches  is  that  at  the  solution  ZTVIXL(x,\)Z 
will  normally  be  positive  definite.  For  this  type  of  update,  see  for  example  Fenyes  [Fen87]. 

Another  alternative  along  a  similar  line  is  to  try  to  approximate  only  the  reduced  Hessian 
ZTkHkZk.  Thi  s  scheme  has  the  advantage  of  requiring  the  storage  of  a  matrix  that  in  many 
cases  is  significantly  smaller  than  the  full  Hessian.  Reduced  Hessian  updating  methods  have 
been  proposed  among  others  by  Murray  and  Wright  [MW78],  Coleman  and  Conn  [CC84], 
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Nocedal  ami  Overton  [N0X5]  and  (III hort  [ClilST].  A  study  of  tho  convergence  properties  of 
tlieso  methods  for  the  case  when  only  equality  constraints  are  present  is  given  in  Byrd  and 
Nocedal  [BNsS]. 

1.3.  Contents  of  subsequent  chapters 

Chapter  2  describes  the  form  of  the  general  algorithm,  whose  variants  will  be  studied  in 
Chapters  1,  5  and  (i.  The  conditions  on  the  search  direction  and  the  multiplier  estimate 
are  presented,  the  assumptions  used  for  the  convergence  proofs  are  introduced,  and  several 
results  bearing  on  the  reasonableness  of  the  previous  conditions  are  presented  and  proved. 

( 'hapter  3  presents  all  results  that  are  common  to  the  convergence  proofs  for  the  different 
algorithms,  (liven  that  the  algorithms  studied  are  defined  to  share  many  elements  (the  merit 
function,  the  determination  of  t  he  search  direction,  termination  conditions  for  the  linesearch, 
etc.),  it  has  been  considered  convenient  to  group  in  this  chapter  the  results  common  to  all 
convergence  proofs. 

Chapter  1  studies  the  convergence  properties  of  an  algorithm  that  uses  a  quasi-Newton 
approximation  to  the  full  Hessian,  and  a  search  direction  constructed  from  information 
available  at  a  stationary  point  of  the  QP  subproblem.  It  is  shown  that  such  an  algorithm 
is  globally  convergent  (that  is.  it  converges  to  a  solution  from  any  initial  point),  and  that 
it  converges  superlinearlv  under  mild  assumptions. 

Chapter  5  considers  the  variant  of  t he  algorithm  when  a  quasi-Newton  approximation 
to  t  he  reduced  Hessian  is  used,  again  only  utilizing  information  at  a  stationary  point  of  the 
QP  subproblem.  T  his  algorithm  is  also  shown  to  be  globally  convergent,  but  it  converges 
two-step  superlinearlv  to  the  solution. 

Chapter  0  presents  and  studies  an  algorithm  that  uses  exact  second  derivatives  in  the 
construction  of  the  QP  subproblem.  Again,  the  scarcli  direction  is  obtained  from  the  infor¬ 
mation  at  a  stationary  point  of  the  quadratic,  subproblem.  It  is  shown  that  the  algorithm 
is  globally  convergent,  and  that  it  converges  quadratically  to  the  solution,  under  mild  as¬ 
sumptions. 

Chapter  7  presents  numerical  results  obtained  from  the  implementation  of  the  algorithm 
introduced  in  Chapter  1.  Finally,  some  remarks  are  included  concerning  the  properties  of 
all  the  previous  algorithms. 


Chapter  2 


The  Algorithm 


Chapters  t,  5  and  (5  present  and  study  the  convergence  properties  of  three  variants  of  an 
SQP  algorithm.  These  methods  differ  in  the  way  the  second-order  information  for  the 
QP  subproblem  (the  matrix  //*  defined  in  the  previous  chapter)  is  generated,  but  they 
share  several  common  features:  the  merit  function  is  the  same,  the  search  direction  is 
generated  according  to  similar  principles  and  the  linesearch  procedure  is  analogous  for  the 
three  methods. 

This  chapter  describes  a  framework  algorithm,  composed  of  the  common  features  men¬ 
tioned  earlier.  Consequently,  the  following  chapters  only  need  to  specify  details  that  differ¬ 
entiate  the  method  presented  from  the  others. 

In  addition,  we  enumerate  the  general  assumptions  that  are  needed  in  the  convergence 
proofs  for  the  different  methods.  Again,  it  is  left  to  the  corresponding  chapters  to  complete 
the  list  with  any  additional  assumptions  required  for  each  individual  method  presented. 
Finally,  as  the  framework  algorithm  specifies  conditions  on  the  way  the  search  direction  is 
to  be  computed,  and  on  the  acceptable  forms  that  the  Lagrange  multiplier  estimates  may 
take,  this  chapter  ends  with  a  justification  for  the  reasonableness  of  these  conditions. 

2.1.  Background 

The  basis  for  the  algorithms  presented  in  this  report  is  the  algorithm  NPSQP,  as  imple¬ 
mented  in  the  code  NPSOL  [GMSW86a]  developed  at  the  Systems  Optimization  Labora¬ 
tory,  Stanford  University.  For  a  theoretical  discussion  of  some  properties  of  this  algorithm, 
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[CJMS\VS(jb]  should  be  consulted;  in  fact,  this  reference  has  been  the  main  source  of  infor¬ 
mation  for  the  work  described  in  the  following  chapters. 

Since  its  inception,  NPSOL  lias  been  shown  to  be  a  very  efficient  code  for  the  solution  of 
small  general  nonlinear  problems.  It  provides  a  good  starting  point  to  propose4  and  analyze 
modifications  to  SQP  algorithms  to  make  them  suitable  for  the  solution  of  large  nonlinear 
problems. 

One  characteristic  of  NPSQP  that  poses  difficulties  in  the  solution  of  large  problems  is 
the  need  to  compute  the  minimizer  for  the  quadratic  subproblem.  The  number  of  iterations 
required  to  solve  the  QP  subproblem  will  in  general  grow  with  the  size  of  the  problem. 
This  increase  in  QP  iterations  raises  two  issues:  in  the  first  place,  it  is  questionable  that  in 
order  to  preserve  overall  efficiency,  the  effort  required  to  compute  a  minimizer  for  the  QP 
subproblem  can  be  compensated  by  a  sufficiently  small  number  of  subproblems  to  be  solved. 
Also,  any  practical  QP  algorithm  has  to  impose  a  limit  on  the  maximum  number  of  QP 
iterations  allowed,  and  so  there  will  exist  cases  in  which  the  exact  solution  is  not  obtained; 
the  question  then  is  how  does  this  affect  the  convergence  properties  of  the  algorithm.  Both 
issues  can  be  addressed  if  we  are  able  to  obtain  a  satisfactory  termination  criterion  for  a  QP 
algorithm  that  is  guaranteed  to  be  achieved  in  a  “moderate"  number  of  iterations.  In  this 
sense,  a  "satisfactory"  criterion  will  be  one  that  is  efficient,  in  the  sense  that  the  number  of 
nonlinear  iterations  is  not  adversely  affected. 

If  the  solution  process  is  terminated  early,  the  search  direction  for  the  outer  iteration  (the 
step  on  the  original  variables)  is  defined  as  the  "total"  step  taken  in  the  QP  subproblem 
up  to  that  point.  The  characteristics  of  the  point  at  which  the  termination  takes  place 
clearly  depend  on  the  specific  strategy  used  to  solve  the  QP  subproblem.  NPSQP,  and 
the  algorithms  described  later  on,  use  an  active-set  strategy  to  obtain  the  solution  starting 
from  a  feasible  point;  this  strategy  dictates  the  kind  of  termination  conditions  that  can  be 
imposed.  As  mentioned  earlier,  the  conditions  imposed  should  have  the  following  properties; 
they  should  limit  the  number  of  QP  iterations  needed  to  obtain  the  search  direction  to  a 
reasonably  small  value,  and  the  conditions  should  be  easy  to  implement. 

Terminating  the  QP  algorithm  prior  to  obtaining  a  solution  impacts  the  SQP  algorithm 
in  a  number  of  critical  ways.  Not  only  the  search  direction  obtained  is  now  of  "lower  quality" 
than  In  . are,  but  also  the  QP  multipliers  available  will  in  general  not  be  positive,  and  it  is 
necessary  to  give  some  rules  on  what  constitutes  an  acceptable  multiplier  estimate  when 
forming  the  search  direction  in  the  multiplier  space.  The  consequences  of  terminating  the 
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QP  solution  early  are  therefore  far  reaching. 

Another  potential  difficulty  when  large  problems  are  considered  is  the  use  of  a  quasi- 
Newton  approximation  to  the  full  Hessian  of  the  Lagrangian  function,  as  it  may  become 
too  large  to  store  in  dense  format,  unless  some  scheme  to  generate  sparse  quasi-Newton 
approximations  is  used. 

One  possible  alternative,  used  for  example  in  the  code  MINOS,  as  described  in  [MS82], 
is  to  work  with  an  approximation  to  the  reduced  Hessian.  For  many  large-scale  problems 
the  size  of  the  reduced  Hessian  is  relatively  small,  and  an  approximation  to  it  may  therefore 
be  stored  in  dense  format. 

Another  alternative  is  to  use  exact  second  derivatives.  In  this  case  the  sparsity  of  the 
second  derivatives  should  alleviate  the  problem  of  storing  and  handling  the  QP  Hessian, 
and  even  for  the  small-scale  case,  improvements  in  the  rate  of  convergence  and  total  com¬ 
putational  Work  can  be  expected. 

Fnfortunately.  this  latter  approach  presents  some  drawbacks.  In  the  first  place,  sub¬ 
problems  may  no  longer  be  convex,  and  an  indefinite  QP  solver  must  be  used.  Also,  a 
unique  minimizer  for  the  subproblem  may  not  exist,  and  it  is  necessary  to  give  conditions 
under  which  a  specific  minimizer  will  be  an  acceptable  search  direction.  On  this  regard, 
it  should  be  noted  that  while  the  definition  of  a  satisfactory  termination  criterion  for  the 
quasi- Newton  algorithms  is  only  one  aspect  in  the  improvement  of  their  efficiency,  for  the 
Newton-type  algorithm  the  termination  criterion  is  directly  related  to  its  convergence  prop¬ 
erties.  Finally,  given  that  the  convergence  proofs  relv  heavily  on  the  similarity  of  the 
convergence  properties  for  the  sequences  -  r*}  and  { p* },  if  the  reduced  Hessian  is  close 
to  singularity  it  is  possible  that  no  minimizer  will  be  acceptable,  and  alternative  termination 
criteria  need  to  be  specified. 

The  preceding  topics  are  our  main  themes.  The  definition  of  the  search  direction  will 
be  introduced  in  this  chapter,  after  the  general  form  of  the  algorithm,  to  be  completed  in 
following  chapters,  has  been  specified.  The  approximation  to  the  second-derivative  infor¬ 
mation  used  by  each  algorithm  will  b«  indicated  in  the  corresponding  chapters.  The  next 
sections  try  to  provide  the  framework  for  all  subsequent  results. 

2.2.  General  form  of  the  algorithm 


This  section  introduces  the  prototype  algorithm.  Following  the  remarks  made  in  the  pre¬ 
vious  section,  this  algorithm  is  directly  based  on  NPSQP.  The  prototype  algorithm  obtains 


hi  lit  ml  form  of  tin  alynritlmi 


t  li<‘  search  direction  from  an  incomplete  solution  for  a  QP  subproblem  of  the  form  indicated 
in  the  previous  chapter.  The  iterates  are  determined  by  performing  a  linesearch  on  the 
following  merit  function: 


/.  ,(.r.  A.s./>)  =  /•'(.<  )  -  A' (etc  -  +-  ip^c(j-)-  .s)  (c(x)  -  .s) 


(2.2.1) 


where  s  >  0  are  slack  variables,  and  the  scalar  p  is  known  as  the  penalty  parameter.  The 
linesearch  is  performed  in  the  space  of  the  variables  x.  A  and  and  the  correspomlitig 
search  directions  are  denoted  by  j>,  f  and  q. 

[’lie  symbols  o(a.p).  or  sometimes  just  o(o).  are  used  to  denote 


o(o./>)  =  /, .,( x  +  ap.  A  +  o£.  s  +  aq.p). 

that  is.  the  merit  fund  ion  as  a  fund  ion  of  t  he  steplengt  h.  The  de  rival  ive  of  o  with  respect 
to  o  is  denoted  by  o'. 

The  following  conventions  will  be  used  in  the  rest  of  the  report. 

!)k  =  V  l-’(jrk)-  h  =  V'c(xc-)-  <'k  =  e(j-jt). 

alt  hough  thi'  last  two  symbols,  Ak  and  will  also  be  used  with  the  same  meaning  but 
restricted  to  the  set  of  active  constraints  at  the  given  point.  The  term  active  constraint  will 
be  used  to  designate  a  constraint  that  is  satisfied  exactly  at  the  current  point  (c,(-r)  =  0 
in  the  nonlinear  problem,  or  a[p  =  -c,  in  the  quadratic  subproblem),  and  the  set  of  all 
constraints  active  at  a  given  point  will  be  referred  to  as  the  active  set  at  the  point. 

1  he  objective  function  for  the  QP  subproblem  will  be  denoted  by  cg-(p), 

<-'k(p)  =  V  F(xk)rp  +  \}>'  II  kp. 

Sometimes,  c  will  denote  the  function  of  one  variable  t,g.(o)  =  ck(p  +  ad).  Finally,  sym¬ 
bols  of  the  form  .Pj/„  indicate  fixed  scalars  related  to  properties  of  the  problem,  or  the 
implementation  of  the  algorithm,  where  " ahe "  identifies  the  specific  scalar  represented. 

The  framework  algorithm 


I  lie  algorithm  described  below  will  be  common  to  the  method.-,  studied  in  the  following 
chapters,  in  the  sense  that  the  latter  will  be  defined  as  specific  algorithms  that  lie  within 
this  framework  algorithm.  I  lie  framework  algorithm  proceeds  through  the  following  steps: 
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(Iduml  form  of  the  algorithm 


( > !  Start  ti'Din  a  point  j-0  and  an  estimate  for  the  Lagrange  multipliers  Aq.  Let  //0  be 
an  approximation  to  the  Hessian  of  the  Lagrangian  function  at  x0,  satisfying  certain 
properties,  ami  h’t  >  ()  he  the  initial  value  for  the  penalty  parameter. 

(ii)  At  each  point  j-;.,  form  the  QP  subproblem 

minimize  gjp  +  ^p1  H  k-p 
subject  to  Akp  >  -Ck, 

where  Hk  denotes  an  approximation  to  the  Hessian  of  the  Lagrangian  function  at 
.1 V.  and  obtain  an  incomplete  solution  />*.  satisfying  certain  conditions  to  be  specified 
later,  t  'ompiite  a  vector  of  multipliers  pk  satisfying  a  second  set  of  conditions  to  be 
specified.  II  i>k  -  0,  set  Ac  =  pk  and  terminate.  Otherwise,  define  £c  =  pk  ~  Ac- 

( iii )  ( ’omputc  >c  from 

max(0 ,ck,)  if  Pk-i  =  0, 

maxfo.ct - — )  otherwise. 

'  Pk- 1  ' 

1  iud  pk  such  t  hat  o'(0)  (or  &"(0)  if  a  curvilinear  search  is  used)  is  bounded  away  from 
zero  by  some  fixed  multiple  of  (|pc|j2- 

( 'ompii l  e  r/c  from 

<ik  =  'hPk  +  Ck  -  »k- 

(iv)  (  output e  the  steplength  Ok  as  follows.  If  is  used  as  a  direction  of  descent,  the 
termination  conditions  for  the  linesearch  are  as  follows: 

If 

o(l)  -0(0)  <  tro'(O)  (2.2.3) 

set  o(.  --  1.  Otherwise,  find  an  <>h  e  (0,1)  such  that 

tf(oc)  -  c6(0)  <  fT«cO'(0)  (2.2.1a) 

<y(nk)  >  r;</(0).  (2.2.4b) 

where  0  <  n  <  r;  <  A. 

If  Uk  is  indefinite,  a  curvilinear  search  may  have  to  be  used.  The  definition  of  O  will 
he  slightly  modifier!,  and  the  new  termination  conditions  are  given  in  Chapter  (>. 


- 

* 

3. 2.  Gem  ml  form  of  the  algorithm 
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( V )  Fonn  flk+l. 

{ vi )  Update  j  k  and  A^.  using 


and  repeat  the  previous  steps  until  convergence  is  readied. 

This  description  of  the  algorithm  still  leaves  many  details  to  be  specified.  The  termi¬ 
nation  criteria  for  the  incomplete  solution  of  the  QP  subproblem  and  the  conditions  on 
the  multiplier  approximation  pt  arc  discussed  below.  The  specification  of  the  form  of  the 
approximation  to  the  Hessian  of  the  Lagrangian  function.  Ilk.  is  left  to  the  correspond¬ 
ing  chapters.  Finally,  for  the  case  when  indefinite  Hessian  matrices  are  used  in  the  QP 
subproblem,  the  form  of  the  modified  search  is  given  in  Chapter  0. 

The  solution  of  the  QP  subproblem 

As  indicated  in  step  (ii)  of  the  algorithm,  in  each  iteration  the  search  direction  is  com¬ 
puted  as  the  incomplete  solution  for  the  local  quadratic  programming  approximation  to  the 
problem,  by  moving  to  a  stationary  point  of  the  QP  subproblem  and  using  the  information 
available  at  that  point  in  the  way  indicated  below.  The  subscript  h  corresponding  to  the 
iteration  number  will  be  dropped  in  what  follows. 

(i)  An  initial  feasible  point  po  for  the  QP  subproblem  is  obtained. 

When  an  incomplete  solution  for  the  QP  subproblem  is  used  to  define  the  search 
direction,  the  choice  of  po  becomes  crit  ical.  If  If  k  is  positive  definite  and  the  rninimizer 
for  the  QP  is  used  to  determine  the  search  direction,  then,  given  the  uniqueness  of  pk, 
the  choice  of  po  is  irrelevant.  If  we  determine  the  search  direction  from  a  stationary 
point  that  is  not  a  rninimizer,  the  sequence  of  stationary  points  that  we  compute 
depends  directly  on  the  value  of  p0.  We  wish  to  define  the  initial  point  in  such  a  manner 
that,  at  least  in  the  positive  definite  case,  all  stationary  points  are  satisfactory  points 
at  which  to  terminate  the  solution  process.  The  condition  that  we  need  to  impose  on 
po  is  one  that  limits  the  size  of  its  norm,  and  in  particular  ||p0||  will  be  required  to  be 
small  whenever  the  points  xk  are  close  to  x*. 
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General  form  of  the  algorithm 


We  start  by  defining  vectors  s  and  r  having  components 

s,  =  max(0, f,  -  //,), 

{c,  -  s,  if  |c,  -  5,|  <  |c,- -  5,  j, 
c,  -  s,  otherwise; 

where  p  denotes  a  multiplier  estimate  such  that  the  following  property  holds: 

II**  —  i[|  —  0  =>  ||cjt  -  sk ||  —  0 

when  i  is  a  stationary  point  for  the  NLP  problem.  From  this  definition,  r  has  the 
following  property: 

IMI<||c-s||.  (2.2.5) 

The  initial  point  pu  should  then  satisfy: 

•  If  c  denotes  the  components  of  c  corresponding  to  the  active  constraints  at  poi 
for  some  constant  3pc  >  0, 

IIpoII  <  M\c\\  (2.2.6) 

•  For  some  constant  l3pcs  >  0, 

IIPoll  <  PPcs\\r\\.  (2.2.7) 

It  is  shown  later  that  these  conditions  are  easily  satisfied,  given  a  reasonable  rule  for 
the  selection  of  the  initial  QP  active  set.  A  stronger  condition,  but  perhaps  of  a  more 
intuitive  nature,  would  be  to  select  1 1 7^0 1 1  <  /?Cm||(’-||i  where  c~  denotes  the  vector  of 
negative  components  of  c  (the  norm  of  the  in  feasibilities  at  the  current  point).  In  this 
case,  we  would  be  requiring  ||po||  to  be  cmall  whenever  we  are  close  to  a  feasible  point 
(and  not  necessarily  just  close  to  a  stationary  point).  Its  disadvantage  is  that  near  a 
solution  this  rule  could  prevent  the  algorithm  from  having  some  desirable  properties 
(such  as  having  one  QP  iteration  per  major  iteration,  for  example). 

(ii)  A  sequence  of  Newton  steps  is  taken  until  a  stationary  point  for  the  QP  subproblem. 
f>,  is  found. 


(iii)  If  the  stationary  point  is  a  second-order  KKT  point,  the  search  direction  is  defined  as 
p  =  i>- 
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(if  m  ml  form  of  l he  algorithm 


( i v )  If  tlio  stationary  [joint  is  not  a  second-order  KKT  point,  either  the  QP  multiplier 
vector  has  some  components  that  are  negative,  or  the  reduced  Hessian  (assuming 
that  exact  second  derivatives  are  used)  has  negative  eigenvalues.  In  this  case,  an 
additional  step,  p  +  ad.  may  need  to  he  taken,  where  a  and  d  should  satisfy  the 
conditions  indicated  below. 

If  the  multiplier  vector  has  negative  elements,  the  conditions  on  the  step  are: 


Cl.  d  is  feasible  with  respect  to  the  active  constraints.  Ad  >  0.  and  its  norm  is 
bounded  above  and  below,  that  is.  for  some  constants  ,iun,(  >  Ji,u{  >  0  it  holds 
that  Jun,i  >  ||(/||  >  It  is  assunu-u  that  <  1.  in  order  to  simplify  the 

arguments  in  the  following  chapters. 

C2.  The  rate  of  descent  along  d  is  sufficiently  large.  If  c(()  =  t’( /)  +  <//).  it  is  required 
that 

f'(0)  =  (  Up  +  <j)'d  <  -,jJs,.nia x,//~  (2.2.*) 


for  some  constant  A,(sc  >  0. 


C3.  The  sleplength  o  is  defined  as  the  step  to  the  minimizer  of  the  quadratic  function 
0(0-  given  by  -iv(0 )/{d1lld),  if  t*  is  convex  and  this  step  is  feasible.  Let  n,. 
denote  t lie  step  to  the  nearest  inactive  constraint,  and  define 


o,„ 


f;(0) 

d'lld 


if  d'lld  >  0, 
ot  herwise. 


(2.2.9) 


Then 


a  =  ir.in(ot-,am ,  a.w ). 


(2.2.10) 


where  ou  >  0  is  a  specified  bound  on  the  largest  acceptable  stop. 


If  the  multiplier  vector  is  non-negative  and  the  reduced  Hessian  is  indefinite,  the 
conditions  are: 


C4.  A  direction  of  negative  curvature  d  for  the  reduced  Hessian  is  computed  satisfying 

||</||=1,  d1  lid  <  L.  Ami,,.  Ad  =  0.  g'd  <  0, 

where  Ainin  indicates  the  smallest,  eigenvalue1  for  the  reduced  Hessian,  and  .1 
denotes  the  Jacobian  corresponding  to  the  active  set  at  p. 
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General  form  of  the  algorithm 


(A  weaker  condition  that  is  sufficient  for  the  convergence  of  these  algorithms  is 
that  for  any  sequence  {d*}. 


dTkHkdk 

d{dk 


’  0  ^  ^min* 


0 


holds. ) 

C5.  Let  ot  be  the  step  to  the  nearest  constraint.  The  step  a  is  defined  as 


a  =  min(a(;,ftM). 


Finally,  for  both  cases  we  impose  the  following  condition: 


C6.  It  is  a  desirable  property  to  avoid  having  search  directions  with  very  small  norms, 
unless  the  corresponding  point  is  close  to  a  solution.  The  following  condition  is 
sufficient  to  ensure  this  property.  Define 

p  [  P  + ad  if  IIpII  <  dslvWP  +  orf||,  (2  2  11) 

1  p  otherwise, 

for  some  constant  f3sip  >  0.  In  what  follows  it  will  be  required  that  fl,ip  >  1. 


It  should  be  noted  that  in  the  case  when  IIk  is  obtained  from  the  exact  second  deriva¬ 
tives,  the  previous  rules  are  not  sufficient  for  the  determination  of  the  search  direction;  the 
complete  set  of  rules  will  be  presented  in  Chapter  6. 

The  multiplier  estimates 

Step  (ii)  of  the  algorithm  requires  not  only  a  search  direction  pk,  but  also  an  estimate 
ftk  for  the  Lagrange  multipliers  at  the  current  point.  The  QP  solution  is  terminated  at  a 

stationary  point,  so  a  natural  choice  would  be  to  use  the  QP  multipliers  as  the  estimate, 

but  in  general  these  may  not  be  the  best  possible  choice,  as  they  may  be  negative,  or  the 

active  set  associated  with  the  search  direction  may  not  in  some  cases  be  the  same  as  the 

one  for  which  the  multiplier  was  obtained.  The  following  set  of  conditions  on  pk  is  sufficient 
to  ensure  that  the  algorithms  have  the  desired  convergence  properties. 


C7.  The  estimates  are  uniformly  bounded  iri  norm. 


2.3.  Assumptions  and  bounds 
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C8. 

||/ifc-A*||  =  0(W), 

where  A*  denotes  the  multiplier  vector  associated  with  the  solution  point  closest  to 

■ffc. 

CC.  The  complementarity  condition  pT(^kPk  4'rt)  =  0  is  satisfied  a<  all  iterations. 

2.3.  Assumptions  and  bounds 

The  algorithm  will  be  applied  to  a  problem  satisfying  the  following  general  assumptions: 
Al.  xk  lies  in  a  closed,  bounded  region  Q  C  3?”,  for  all  k. 

A2.  F,  c,  and  their  first  and  second  derivatives  are  continuous  and  uniformly  bounded  in 
norm  on  il. 

A3.  The  Jacobian  corresponding  to  the  active  constraints  at  any  limit  point  of  the  sequence 
generated  by  the  algorithm  has  full  rank. 

A4.  The  quadratic  subproblems  are  always  feasible;  furthermore,  there  exists  a  subset 
of  linearly  independent  constraints  corresponding  to  the  violated  constraints  for  the 
NLP  problem,  such  that  its  condition  number  is  bounded  and  its  least-norm  solution 
is  feasible. 

A5.  Strict  complementarity  holds  at  all  stationary  points  for  the  nonlinear  program  in  Q. 
A6.  The  reduced  Hessian  is  non-singular  at  all  solution  points  for  the  problem. 

The  bounds 

From  the  previous  assumptions,  several  quantities  are  uniformly  bounded  in  the  algorithm. 
We  introduce  the  notation  that  will  be  used  throughout  the  following  chapters  for  some  of 
these  bounds.  The  first  three  bounds  follow  from  assumption  A2;  the  fourth  follows  from 
A3. 

iinmA  is  a  bound  for  the  norm  of  the  Jacobian:  ||/4*||  <  PnmA- 
finrr.c  is  a  bound  for  the  norm  of  the  constraint  vector:  ||c*,.||  <  /Jnmc. 


2-4 ■  Auxiliary  results 
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3njng  is  a  bound  for  the  norm  of  the  gradient:  )|g^||  <  (3nmg . 

Jnmu  is  an  upper  bound  for  the  norm  of  the  multipliers  corresponding  to  a  minimizer  for 
the  QP  subproblem:  ||/2fc||  <  /3nmti. 

2.4.  Auxiliary  results 

This  section  presents  a  certain  number  of  basic  results,  either  justifying  the  conditions 
introduced  before,  or  establishing  properties  to  be  used  in  the  following  chapters. 

Initial  points  for  the  QP  subproblem 

It  is  of  interest  to  show  that  the  condition  on  step  (i)  for  the  solution  of  the  QP  subproblem 
can  be  satisfied.  In  fact,  the  role  of  assumption  A4  is  to  guarantee  that  this  condition  can 
be  achieved.  Condition  (2.2.6)  is  satisfied  if  the  Jacobians  for  the  initial  active  sets  have 
bounded  condition  numbers.  Condition  (2.2.7)  requires  some  additional  justification. 

From  A4  it  follows  that  there  exist  feasible  points  for  the  QP  subproblem  satisfying  the 
condition 

IIPoll  <  Am  HOI. 

for  some  positive  constant  (3cm. 

Consider  now  the  following  relationship,  which  will  be  often  used  in  the  next  chapters. 
For  any  vector  v  defined  as  v,  =  min(c;,u>i),  where  w  is  any  other  vector,  it  holds  that 

Ik" II  <  IMI.  -s''>ce 

if  c~  =  0  then  c~  <  |v,|, 

if  c~  >  0  then  if  u;  =  c,  then  c~  =  |vj|, 

if  V{  =  w,  then  c~  <  |w,j  =  |t;tj. 

This  implies 

HOI  <  lie  —  ®||,  Ik"  ||  <  \\c-s\\ 

and 


Ik  II  <  Ikll  <  Ik  -  -sll- 


(2.4.1) 


Auxiliary  results 
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Multiplier  estimates 

The  next  results  explore  some  implications  of  the  conditions  on  the  multipliers  given  in  the 
previous  sections,  and  also  present  some  examples  of  estimates  satisfying  these  conditions. 

A  consequence  of  condition  C7  and  the  form  in  which  multipliers  are  updated  is  the 
boundedness  of  the  multipliers  in  the  algorithm.  This  result  is  Lemma  4.2  in  [GMSW86b], 

Lemma  2.4.1.  For  all  k  >  1, 


||Afc||  <  max  H/ijII, 
0<j<k —  I 

and  hence  ||A;.||  is  bounded  for  all  k. 

Proof.  By  definition, 


Ao  —  Mo 

Afc+i  =  A k  +  Qk(Fk  ~  Ajt),  k  >  1.  (2.4.2) 

The  proof  is  by  induction.  The  result  holds  for  Ao  =  /to  because  of  the  boundedness  of 
the  multiplier  estimate  (condition  C7).  Assume  that  the  lemma  holds  for  A^.  From  the 
definition  of  A^.+  i  and  norm  inequalities,  we  have 

Pfc+ill  <  +  (1  -  <*/.-)||Afc||. 

Since  0  <  a  <  1,  the  inductive  hypothesis  gives 

l|Afc+i||  <  max Jl/tjll, 

0<J<k 

as  required.  | 

Conditions  C7-C9  are  sufficiently  general  to  be  satisfied  by  most  reasonable  estimates, 
as  the  next  lemmas  show.  Nonetheless,  some  attention  must  be  paid  to  the  satisfaction  of 
condition  C7,  concerning  the  boundedness  of  the  estimate,  although  that  boundedness  is 
guaranteed  asymptotically  by  assumption  A3.  In  general,  any  reasonable  scheme  to  limit 
the  norm  of  the  multiplier  estimate  will  not  affect  condition  C8. 

An  issue  that  needs  to  be  mentioned  regarding  condition  C8  is  the  necessity  to  identify 
the  correct  active  set  when  Xk  is  close  enough  to  x*.  (Since  the  problem  may  have  several 
solution  points,  we  use  x*  in  this  context  to  denote  the  solution  closest  to  1^.)  The  next 


results  assume  that  t his  is  the  case,  but  the  formal  proof  for  this  property  is  given  in 
Chapters  4.  5  and  (j,  where  it  will  be  shown  that,  independently  of  C8,  if  ||xjt  —  x* ||  is  small 
enough  the  correct  active  set  must  have  been  identified.  Note  that  if  jjx^  -  x*||  is  bounded 
away  from  zero,  C8  will  be  satisfied  automatically  by  any  multiplier  estimate. 

I  he  following  candidates  for  the  estimate  will  be  shown  to  satisfy  C8-C9,  assuming 
that  the  correct  active  set  has  been  identified. 

(i)  The  QP  multipliers  at  stationary  points  found  by  the  algorithm. 

(ii)  The  least-squares  multipliers  at  X*. 

(iii)  The  least-squares  multipliers  at  x^  +  pk- 

For  the  following  results,  let  {x*}  denote  a  convergent  sequence  such  that  x*.  —  x*,  a 
stationary  point  for  problem  NLP  with  multiplier  vector  A*.  Also,  we  assume  that  \\Hk\\  is 
bounded,  and  that 

M  =  0(||x*-x*||). 

In  Chapters  1.  5  and  (i  it  will  be  shown  that  this  last  result  holds  for  the  points  obtained 
by  the  algorithms  considered  there. 

Lemma  2.4.2.  Lit  fik  denote  the  QP  multipliers  at  a  stationary  point  pk  of  the  QP  sub- 
problem  at  xk.  haring  the  same  set  of  active  constraints  as  x*.  If  j|pjt||  =  0(||xjt  -  x*||), 
tin  n 

\\fik-X*\\  =  0(\\xk-x*\\). 

Proof-  From  the  definition  of  fik, 

Ajfik  =  HkPk  +  0  k, 

and  from  the  corresponding  Taylor  series  expansion, 

Alfik  =  A*Tfik  -  'Ej'ii <y2cx{xk)(x*  -  xk)  +  0(||ifc  -  x* II2). 

I  rom  the  definition  of  X*  and  the  previous  equation, 


A*r(ftk  -  X* )  =  gh  -  g*  +  Hkph  +  Yl,Pk,  V2c,{xk)(x*  -  xk)  +  0(||x*  -  x*||2). 
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and  again  using  a  Taylor  series  expansion  for  g k, 

A*Tifik  ~  A*)  =  Wk(xk  ~x*)  +  HkPk  +  0(||a:*  -  x*\\2) 

where  IT*,  denotes  the  Hessian  of  the  Lagrangian  function  at  xk ,  defined  using  (lk  as  the 
Lagrange  multiplier  estimate. 

From  assumptions  A2  and  A3  and  the  boundedness  of  Hk  the  desired  result  follows. 

I 

The  following  lemma  presents  the  corresponding  results  for  the  least-squares  multiplier 
estimates,  pk. 

Lemma  2.4.3.  The  least-squares  multipliers  at  xk  satisfy 

\\Pk  ~  A* ||  =  0(||ijt  -  x*||) 

and  assuming  \\xk  +  pk  -  x*\\  =  o( ||xfc  -  x*||),  the  least-squares  multipliers  at  xk  +  pk  satisfy 

II Pk  -  A* II  =  o(lk*  “  ^ID- 

Proof.  From  AkA^pk  =  Akgk,  A*r\*  =  g*  and  Ak  =  A*  - f  0(\\xk  -  x*||)  it  follows  that 
A*A*T(pk  -  X*)  =  A*(gk  -  g*)  +  0(||xfc  -  **||)  =  0(||xfc  -  **||), 
and  from  the  non-singularity  of  A*A*T  we  get 

Pk  -  X*  =  0(||ifc  -  x*  1 1 ) . 

For  the  second  case,  under  the  same  assumptions  as  before,  if  we  denote  by  A'k,g'k  the 
corresponding  values  obtained  at  xk  +  pk ,  using  A'k  =  A*  +  0(||x<c  +  pk  -  x*||)  we  have 

A*A*T(p'k  -  X*)  =  A*(g'k  -  g*)  +  0(\\xk  +  pk  -  x*||)  =  0(||xfc  +  pk  -  x*||), 

and  from  the  assumptions, 

Pk  ~  X*  =  0(||xfc  +  pk  -  x*||)  =  o(\\xk  -  x*||), 


completing  the  proof.  | 


Chapter  3 


General  Results 


The  previous  chapter  has  introduced  a  framework  algorithm  to  be  used  in  the  definition 
of  the  three  methods  analyzed  in  the  following  chapters.  The  study  of  these  algorithms 
centers  on  the  determination  of  their  convergence  properties,  that  is,  the  proof  that  they 
are  globally  convergent,  and  the  characterization  of  their  asymptotic  rates  of  convergence. 

Given  the  many  common  features  of  the  different  algorithms,  the  arguments  used  to 
show  these  results  naturally  follow  the  same  general  pattern  and  present  a  considerable 
number  of  similar  steps.  This  chapter  introduces  the  general  structure  shared  by  the  proofs 
developed  in  the  following  chapters,  and  proves  those  results  that  apply  to  all  algorithms, 
because  they  are  independent  of  the  way  //*.  is  defined,  the  specific  details  in  the  determi¬ 
nation  of  the  search  direction,  etc.  In  this  way,  the  actual  convergence  proofs  given  in  the 
next  three  chapters  only  need  to  establish  those  results  that  depend  on  the  specific  details 
characterizing  each  one  of  the  algorithms,  and  will  make  use  of  the  general  results  in  this 
chapter  for  those  aspects  that  they  have  in  common. 

The  lemmas  presented  in  the  following  sections  leave  many  unjustified  steps  in  the 
argument  of  the  proofs,  corresponding  to  those  results  that  are  particular  to  each  algorithm. 
These  steps  are  stated  as  pioperties,  denoted  by  Px,  where  “x”  is  a  digit,  and  they  are 
assumed  to  hold  for  subsequent  lemmas.  The  convergence  proofs  in  Chapters  4,  5  and  6 
prove  that  these  properties  hold  for  the  different  algorithms.  For  ease  of  reference,  at  the 
end  of  the  chapter  we  include  a  list  of  all  the  properties  introduced. 
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3.1.  Convergence  properties 


This  section  motivates  the  common  structure  shared  by  the  convergence  proofs  in  the  fol¬ 
lowing  chapters,  by  presenting  the  questions  these  proofs  will  address.  It  is  important  to 
remember  that  the  results  presented  in  this  chapter  do  not  try  to  answer  the  questions 
posed  below;  they  only  introduce  a  number  of  basic  results,  to  be  used  in  Chapters  4,  5  and 
6  to  answer  these  questions. 

All  of  our  algorithms  generate  an  infinite  sequence  {zfc}£L0  wh°sc  limit  point  is  a  solution 
for  the  problem.  In  order  to  establish  global  convergence  (i.e.,  independently  of  the  initial 
point  selected,  the  algorithm  finds  a  solution  for  the  problem),  we  want  to  show  that  the  limit 
point  of  the  sequence  has  certain  desired  properties.  Notice  that  under  assumption  Al,  the 
sequence  will  always  have  convergent  subsequences.  Furthermore,  from  assumptions  A3  and 
A6  it  is  possible  to  show  that  the  limit  point  is  in  fact  unique.  Proving  global  convergence 
is  then  equivalent  to  proving  that  the  limit  point  is  a  solution  point.  In  what  follows,  we 
denote  the  limit  point  by  z*,  so  that  we  have  xk  —  z*  The  proofs  in  Chapters  4,  5  and  6 
will  start  by  examining  the  properties  of  z* 

In  subsequent  chapters  we  will  also  determine  the  rate  of  convergence  of  the  sequence 
{ 1 1 x A.-  -  x*  || } .  Specifically,  we  will  provide  answers  to  ‘he  following  questions: 


•  What  is  the  value  of 


when  both  n  =  1  and  rn  =  1? 


lim 

k  — *oo 


II  J  t-  +  m  x  |! 

||Xfc  -  X^||n 


•  If  the  previous  answer  is  zero,  is  there  a  value  of  n  with  m  =  1  for  which  the  answer 
is  finite  and  strictly  positive? 

•  If  the  answer  to  the  first  question  is  not  zero,  is  there  a  value  of  m  with  n  —  1  for 
which  the  answer  is  zero? 


To  characterize  the  different  answers  to  the  previous  questions,  we  say  that 
(i)  the  algorithm  converges  sup<  rlinairly  (or  one-step  supcrlinearly)  if 

|kfc+i“**| 


lilTl 

k~OG  \\xk  -  ZT 


=  0; 
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.t.J. 


(ii)  the  algorithm  converges  two-step  sii/h  rliwarly  if 

,  .  lk*+2~  -rl  „. 

llJfc  —  •r*ll 

(iii)  finally,  the  algorithm  converges  quailratieally  if 

I|j*..+  i  —  x*|| 

0  <  hm  ”tt - J77~  <  00. 

||  -  x*||2 


A  further  question  of  interest  is  how  the  penalty  parameter  />*.-  behaves  as  k  —  oc.  A 
desirable  property  for  pk  is  that  it  remain  bounded  throughout  the  algorithm,  and  in  this 
chapter  we  introduce  some  conditions  that  guarantee  this  property. 


3.2.  Structure  of  the  proofs 

In  this  section  we  present  and  motivate  the  steps  that  we  will  take  to  obtain  the  answers 
to  the  previous  questions.  These  steps  also  attempt  to  justify  the  results  proved  in  this 
chapter,  so  that  they  can  more  easily  be  put  into  the  framework  of  the  convergence  proofs 
presented  in  Chapters  I.  5  and  (i.  Some  of  the  results  will  be  shown  to  hold  in  Chapters  4, 
5  and  G.  while  some  others  are  proved  in  this  chapter;  we  try  to  indicate  for  each  one  of  the 
statements  when'  the  corresponding  proof  can  be  found. 

(i)  A  first  observation  is  that  the  sequence  {xjt  —  x*}  is  not  easy  to  study,  given  that  part 
of  the  information  is  available  at  iteration  k,  but  another  part,  a  ,  is  not  known  until 
the  end  of  the  process.  It  will  be  seen  that  the  sequence  of  search  directions  {pk}  can 
be  studied  in  its  place,  and  this  sequence  mimics  the  behavior  of  {.r^.  —  x*}.  This  is 
done  here  by  proving  that 

Ikfc  ~  £*\\  =0(M>. 

||p,||  =  0(!|x,-x*||). 

(ii)  A  first  step  in  establishing  these  relationships  is  to  show  that  the  correct  active  set  at 
the  sole t ion  is  identified  after  a  finite  number  of  iterations.  To  be  more  precise,  for 
the  different  algorithms,  and  in  the  corresponding  chapters,  we  prove  that  if  ||/g||  is 
small  enough,  then  the  correct  active  set  must  have  been  identified. 


.'si mi  l un  a/  lln  proofs  _\V 

ini  i  Tin'  convergence  oi  the  s<‘c|  {/^}  is  proved  using  tin*  boundedness  ol  I  lie  merit 

function.  In  utlit'i'  words,  tin'  merit  function  decreases  in  each  iteration,  and  the 
decrease  is  relatt'd  to  the  value  of  |jpfc||*.  As  the  merit  function  is  hounded  below, 
from  assumptions  Al  and  A2  and  l.emma  2.1.1.  this  implies  tiiat  |j/^.j|  —  0.  and 
from  the  previous  remarks  global  convergence  follows.  I  his  lundamental  result  is 
given  in  the  corresponding  chapters  (or  each  of  the  different  algorithms. 

( i  v  )  l'o  est  aldish  the  hound  on  t  lie  decrease  in  t  he  value  of  the  merit  fu  net  ion.  it  is  necessary 
to  start  by  showing  that  the  search  direction  is  an  acceptable  descent  direction  for  the 
merit  fu m  lion.  Again,  and  to  be  more  precise,  what  we  prove  in  Chapters  1.  5  and  (i 
is  that  for  positive  constants  T|  and  Tj . 

<ilvk  +  hpliikPk  <  ~  (i|l/c!lJ  +  (jIIcII- 

tv)  llit'  descent  available  for  the  merit  function  in  any  iteration  is  dependent  on  the  value 
chosen  for  p.  This  property  is  used  to  select  a  suitable  value  for  the  penalty  parameter 
in  each  iteration.  This  is  different  from  the  strategy  used  in  many  algorithms,  in  which 
p  is  selected  so  that  the  Hessian  of  the  augmented  I.agrangian  is  positive  definite  at 
the  solution.  All  of  our  algorithms  define  />  so  that  the  directional  derivative  at  the 
beginning  of  the  linesearch  is  sufficiently  negative,  that  is.  o'k  satisfies  a  condition  of 
the  form 

o',(0)  < 

but  at  the  same  time  p  is  not  large  enough  to  prevent  convergence.  Lite  particular 
form  in  which  the  penalty  parameter  is  defined  depends  on  the  algorithm  considered, 
and  so  it  i-  let t  to  the  corresponding  chapters. 

( v  i )  I  lie  last  requirement  to  ensure  global  convergence  is  to  prove  that  the  stoplength  is 
uniformly  bounded  away  from  zero.  The  reason  for  this  condition  is  that  the  descent 
in  the  merit  fu  net  ion  is  really  bounded  by  |lo/v./ii.|j‘.  and  m  .  in  this  chapter  we  establish 
that  what  goes  to  zero  is  t  he  norm  of  I  he  search  direction,  and  not  the  stoplength. 

1  v i i j  As  a  consequence  ol  tie1  global  convergence  ol  the  algorithms  and  the  conditions 
imposed  on  i  lie  estimates  pK.  the  Lagrange  multiplier  estimates  A/,  also  converge  to 
t  he  1  oi  ler!  Value. 
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(viiii  ( \>u<  1'iimiu.  lli*’  rate  of  convergence.  the  significant  remark  is  that  in  general  the 
quest  ions  raised  earlier  have  known  answers  for  the  sequence  { x ^  +7 >h  —  J'*}.  I  he 
proofs  given  in  the  following  chapters  have  two  parts;  in  one  we  show  that  eventually 
a  unit  si eplengt h  is  always  accepted,  and  so  the  previous  sequence  is  the  relevant, 
one  for  1  his  question,  and  in  the  other  we  establish  the  corresponding  results  for  this 
sequence, 

i  i x  ■  A  final  issue  is  the  study  of  conditions  under  which  the  penalty  parameter  remains 
bounded  throughout  the  algorithm.  I’sing  the  previous  results,  we  introduce  at  the 
end  of  the  chapter  some  conditions  that  imply  this  property. 

I  he  n“Xt  sections  present  results  that  are  common  to  lit*'  proofs  for  all  throe  methods. 
,d‘>ng  the  lines  indicated  above. 

o.3.  Properties  of  the  search  direction 

I  lie  first  group  of  results  explores  the  relationship  of  stationary  points  for  the  QI’  subprob- 
htns  and  stationary  points  for  problem  NI.P.  The  significance  of  this  relationship  is  due 
to  the  fact  that  the  search  direction  is  obtained  from  information  available  at  a  stationary 
point  of  t  lie  Q|’  subproblem.  The  results  shown  below  are  similar  in  spirit  to  t  hose  in 
Robinson  'RobTll.  'They  will  be  used  in  subsequent  chapters  to  show  that  the  value  of  ||/>n.|| 
is  ■small"  if  and  only  if  we  are  dose  to  a  solution  point,  with  corresponding  implications 
regarding  the  identification  of  the  correct  active  set. 

Lemma  3.3.1.  l  or  tuni  x  P  il.  It  I  p  ht  a  slntiontir ;/  point  for  tin  QI’  subpmbU  in  at  .r. 

linn 

V<  >  0  3  h  >  0  3  ./•  9  ||/)||  <  I)  =>  ||.r  —  i’ll  <  *, 

ir/d  rt  x  is  a  slnl  iontirj)  po  nil  for  tin  nonlint  ar  pmtptnn  A 1,1 iritli  tin  sunn  sit  of  act  in 
const rtn nl s  as  />,  or  ./•  is  11  ftnsihlt  point  win  r<  tin  Jacobian  of  tin  act  in  constraints  is 
siinpilar. 


Proof.  A-  -lime  that  the  result  does  not  hold:  tie’ll  there  exist  sequences  { /g  }  |  .  and 

1  .  --iii  h  1  hat  pi.  is  a  si  at  ion  ary  point  for  t  he  Q  i'  suhprohlcm  at  x  1  sat  isfying  ||/n-|j  —  0. 
and  rf  •  t  for  .Mime  <  :>  0  and  ail  x  with  the  previous  proper! i«‘s. 


■3.3.  Properties  of  the  starch  direction 


31 


A  convergent  subsequence  can  be  extracted  from  {2^},  using  the  compactness  of  ft. 
Select  now  a  sub-subsequence  having  fixed  active  set,  a  subset  of  the  active  set  at  the  limit 
point  i. 

If  we  take  limits  in 

AkPk  +  ck  >  0 

and  apply  assumption  A2,  it  immediately  follows  that  x  must  be  feasible. 

If  the  set  of  active  constraints  is  non-singuiar  at  x.  from 

HkPk  +  <Jk  =  A\ 

there  will  exist  a  subsequence  along  which  {pk}  converges,  pk  — >  p.  Taking  limits  along 
this  subsequence, 

9  =  ATP- 

This  result  implies  that  x  is  a  stationary  point  for  the  nonlinear  problem,  contradicting  the 
assumption. 

To  show  that  the  set  of  active  constraints  should  be  the  same  for  p  and  x ,  in  the  case 
when  the  Jacobian  at  x  is  non-singular,  assume  that  sequences  as  described  above  exist,  but 
that  the  set  of  active  constraints  at  each  pk  is  not  the  same  as  the  set  of  active  constraints 
at  x.  As  \\pk\\  —  0,  the  set  of  active  constraints  at  each  pk  must  be  a  subset  of  the  active 
constraints  at  i;  but  if  it  is  a  proper  subset,  then  there  must  exist  an  index  i,  active  at 
i,  such  that  pk,  =  0  for  large  enough  k ,  and  this  will  imply  p ,  =  0,  violating  the  strict 
complementarity  assumption.  | 

I'he  assumptions  on  the  form  of  the  problem  guarantee  that  large  enough  steps  can  be 
taken  from  stationary  points  in  the  QP  subproblems  when  the  points  considered  are  not 
close  to  solutions  for  the  problem.  The  algorithm  makes  use  of  this  property  to  move  away 
from  stationary  points  for  NLP.  The  next  result  establishes  the  existence  of  some  of  the 
necessary  bounds. 

Lemma  3.3.2.  Then  exist  positive,  values  f3spc,  ,hPm .  Pspn,  such  that  for  all  stationary 
points  x. 

min  c(  >  ■  t spe ■ 

ix.M) 

for  those:  stationary  points  having  some  negative  multiplier  clement, 

max p ,  >  Asprr^^ 
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and  fur  those  slot  ionary  points  that  have  a  non-negative  multiplier  vector,  but  are  not  second- 
order  I\  k  T  jxiints. 

max  Aj  > 

t 

win  re  A,  denotes  the  itli  eigenvalue  for  the  reduced  Hessian  at  x. 

Proof.  Assume'  that  there  exists  a  sequence  {xk}  of  stationary  points  for  problem  NLP  in 
such  that 

min  ct  —  0. 

i:cti  >o 

From  the  compactness  of  fl,  a  convergent  subsequence  can  be  extracted  having  fixed 
active  sot.  and  such  that  the  minimum  is  always  achieved  for  the  same  constraint  (or  set 
of  constraints).  Lot  x*  denote  the  limit  point,  which  will  also  be  a  stationary  point  for 
the  problem  (or  will  have  a  singular  Jacobian  for  the  active  constraints,  except  we  exclude 
this  case  by  invoking  assumption  A3).  At  x*  assumption  A5  will  be  violated,  as  the 
corresponding  constraints  are  active  but  have  zero  multipliers. 

If  the  sequence  is  such  that 

max  —  0 

using  the  same  construction,  assumption  A5  will  again  be  violated  at  x*,  since  at  least  one 
of  the  multipliers  corresponding  to  an  active  constraint  will  be  zero. 

Finally,  if 

max  AT  —  0 

i 

for  a  sequence  of  first-order  KKT  points,  the  limit  point  will  be  a  second-order  KKT  point 
but  assumption  A6  will  be  violated,  as  the  reduced  Hessian  will  be  singular.  | 

Using  the  previous  lemmas,  in  Chapters  4,  5  and  6  we  establish  the  following  property 
for  the  different  algorithms: 

PI.  There  exists  a  value  e'  >  0  such  that  if  ||pjt||  <  e' ,  then  the  correct  active  set  at 
a  solution  of  problem  NLP  has  been  identified,  and  pk  is  a.  minimizer  for  the  QP 
mu  bproblem. 

In  what  follows,  we  assume  that  this  property  holds. 


■7../.  i'jjuivaU  nee  of  s<  gut  tin  s 
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3.4.  Equivalence  of  sequences 

For  a  given  sequence  {x/,.}.  the  next  results  establish  the  equivalence  between  the  sequences 
{,/■<.  -  x*}  ami  allowing  us  to  continue  the  study  of  the  conveigence  properties  for  the 

algorithms  on  the  sequence  of  search  directions. 

Lemma  3.4.1.  If  x*  denote s  the  solution  )>oint  closest  to  x t ,  then  then  exists  a  constemt 
A/p,  independe  nt  of  k,  such  theit 

lk*-**||  <  A/p||Pfcll-  {-L 4.1 ) 


Proof.  The  proof  is  in  essence  the  one  for  Lemma  4.1  in  [GMSWKbb],  and  takes  the 
following  form.  Let  c  denote  the  vector  of  constraints  active  at  x* ,  let  A  be  the  Jacobian  of 
the  active  constraints,  and  Z  an  orthogonal  basis  for  the  null  space  of  A.  Define 


h(x)  = 


c{x)  \ 

Z(x)7y(x)  J 


Expanding  li ,(■>’)  about  x*.  and  noting  that  h(x*)  =  0.  we  obtain 

hi(x)=  IIi(9i)(x-x*), 


for  H,[0,)  =  V/»,( x*  +  0 ,(x  -  x*)),  where  0  <  0,  <  1  (see  Goodman  [Go85],  for  a  discussion 
of  the  definition  of  //,).  Define  Sg  as  the  matrix  whose  rows  are  given  by  11,(0,).  Then 


/  c(x) 

\  Z(x)rej(x) 


=  Sg ( X  -  x*). 


(3.4.2) 


Assume  that  \\pk\\  <  e'  for  suitably  small  e' ,  so  that  property  PI  applies  and  the  smallest 
singular  value  of  the  reduced  Hessian  of  the  Lagrangian  function  is  bounded  below.  From 
assumption  A5.  Sg  is  nonsingular,  with  smallest  singular  value  uniformly  bounded  below 
(see.  e.g.,  Robinson  [Rob?!]).  Because  of  assumption  Al,  the  relation  (3.1.1)  is  immediate 
if  ||p* ||  >  (\  and  we  henceforth  consider  only  iterations  k  such  that  ||/>fc||  <  e' . 

Taking  x  =  x/,.  in  (3. 1.2).  and  using  the  nonsingularity  of  Sg  and  norm  inequalities.  we 
obtain 


Ik* -x*||<T(||c*||  + 11^,11) 


(3,1.3) 
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for  some  bounded  J.  We  now  seek  an  upper  bound  on  the  right-hand  side  of  this  equation. 
Since  the  solution  for  the  QP  subproblem  identifies  the  correct  active  set,  pk  satisfies  the 
equations 

AkPk  =  -Ck  and  Z!kllkpk  =  -Zj<jk. 

From  these  equations,  assumption  A3  and  the  positive  definiteness  of  the  reduced  Hessian, 
it  follows  that  there  must  exist  a  constant  /J  >  0  such  that 

<3(||c*||  +  \\Zk9k\\)  <  Ml-  (3-4.4) 

Since  J  and  J  are  independent  of  k ,  combining  (3.4.3)  and  (3.4.4)  gives  the  desired  result. 


The  converse  statement  is  proved  in  the  next  lemma.  This  result  is  not  strictly  accessary 
for  the  convergence  proof,  but  it  is  included  for  completeness,  and  because  it  simplifies 
certain  arguments.  It  also  requires  certain  additional  assumptions,  whose  validity  will  be 
established  in  the  following  chapters.  In  particular,  if  Zk  denotes  a  basis  for  the  null  space 
of  the  Jacobian  at  xk  corresponding  to  the  constraints  active  at  x*  (defined  in  the  same 
way  as  before),  then  the  sequence  {ZkHkZk}  must  be  bounded,  and  any  limit  point,  say 
ZZ1  II  *Z*.  must  be  positive  definite. 

Lemma  3.4.2.  Let  x*  denote  the  solution  jxiint  closest  to  a>.  If  any  limit  of  the  sequence 
{Z[.HkZk}  is  positive  definite,  then  there  exists  a  constant  Mx,  independent  ofk,  such  that 

Ml  <  Mr\\Xk  -**||. 


Proof.  We  start  by  showing  that  whenever  || xk  -  .r*||  —  0,  we  must  also  have  \\pk\\  —  0. 

Assume  that  that  is  not  the  case.  Then  there  exists  a  sequence  {pk}  obtained  from  QP 
subproblems  at  points  {.rjt}  satisfying  xk  — *  x*.  and  such  that  ||pk||  >  e  for  all  k  and  some 
r  >  0. 


Also,  there  must  exist  a  first  QP  step  dk  along  the  way  to  pk,  satisfying  ||r4||  >  c,  where 
r  >  0  and  all  previous  steps  converge  to  zero.  Define 


h  =  ( 


dk 

II4II 


so  that  hk  is  a  feasible  QP  step.  Extract  a  subsequence  along  which  both  Z[flkZk  and  hk 
have  a  limit.  Then,  if  pk  denotes  the  step  taken  in  the  QP  subproblem  immediately  before 
obtaining  dk. 


(HkPk  +  (Ik )  ldk  <  0, 
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ami  taking  limits  we  obtain 

<fTb*  <  0  =>  X*tA*S*  <  0, 

but  from  strict  complementarity  and  feasibility  it  must  hold  that  —  0.  Again,  taking 
limits  in 

i'h[Vk)  ~  4'k(Pk  +dk)>  0 

we  must  have 

(tzTZ*TII*Z*d*z  <  0, 

contradicting  the  assumption  that  Z*T1I*Z*  is  positive  definite,  so  p*  =  0. 

This  result  implies  that  there  exists  a  6  >  0  such  that  for  ail  <5  <  6, 

Ik*  -  x*j|  <  6  =>  JJpjtll  <  c\ 

where  ('  is  the  value  in  property  PI,  p,  is  obtained  as  the  solution  of  the  QP  subproblem 
and  the  correct  active  set  has  been  identified. 

If  ||x,  -  x*  ||  >  6,  the  result  follows  trivially.  Assume  that  ||x,  -  x+j|  <  S.  Then,  as  in 
the  proof  for  Lemma  3.4.1,  from  (3.4.2)  and  the  boundedness  of  Sg  we  get 

Ik,  -x*||>/?'(|M|  +  \\ZTkgk\\).  (3.4.5) 

Also,  from  the  nonsingularity  of  T*  and  Zj7/,Z,  for  large  k,  for  small  enough  ||x,  -  **11 
we  have,  given  that  p,  is  obtained  as  a  minimizer  of  the  QP  subproblem, 

J'(||c,||  +  ||Z^fc||)>||p,||.  (3.4.6) 

Combining  (3.4.5)  and  (3.4.6)  gives  the  desired  result.  | 

The  previous  lemmas  justify  replacing  the  study  of  the  sequence  of  distances  to  the 
solution  set  by  the  sequence  of  search  directions.  A  result  that  is  closely  associated  to  the 
last  two  lemmas,  and  that  completes  the  justification  for  the  study  of  the  sequence  {p,},  is 
given  by  the  following  property  that,  as  in  the  previous  case,  will  be  assumed  to  hold  for 
the  rest  of  the  chapter,  and  is  proved  in  the  following  chapters. 

P2.  ||p,||  =  0  if  and  only  if  x,  is  a  solution  for  problem  NLP. 

It  should  be  remembered  from  the  remarks  in  Chapter  1  that  the  meaning  of  a  solution 
for  problem  NLP  depends  on  the  algorithm  used,  but  in  any  case  it  is  either  a  first-order 
or  a  second-order  KKT  point. 

It  was  mentioned  before  that  under  assumption  A6  the  sequence  generated  by  the 
algorithm  has  a  unique  limit  point.  The  next  lemma  proves  this  result. 
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Lemma  3.4.3.  // 1 J px.- 1 1  — *  0  and  x^+1  is  obtained  as  =  Xk  +  ot^Pk*  0  <  Qk  <  1,  then 
thi  stquinn  {jy.}  has  a  limit  x*,  a  solution  point  for  the  problem. 

Proof.  From  assumption  A1  and  Lemma  3.4.1,  it  holds  that  any  limit  point  for  the  se¬ 
quence  is  a  solution  point.  If  there  exists  a  unique  limit  point  for  the  sequence,  the  proof  is 
complete.  Assume  then  that  there  exists  more  than  one  limit  point. 

From 

lkt+i  “  **11  =  OfcUpfcH  —  0 

it  follows  that  the  limit  points  cannot  be  isolated.  To  prove  this,  assume  that  we  do  have 
isolated  solutions,  and  in  particular  that  there  exists  a  limit  point  x*  and  a  positive  value 
<  such  that  for  any  other  limit  point  x  we  have  ||x*  -  ±||  >  e. 

Let  {£r, }  denote  a  subsequence  converging  to  x*.  and  such  that  {it.  +  t}  is  convergent, 
but  its  limit  point  x  is  different  from  x*.  Select  i  large  enough  to  have 

Ik*:,  -  a-* II  <  7  lk*,  +  i  -  ill  <  7  Ik*,  -  **.  +  ill  <  7 

We  can  then  write 

Ik*.  -  **,  +  ill  >  ||x*  ~  ill  -  Ik*.  -  i*||  -  lk*.+i  -  ill  =►  Ik*  -  ill  <  J 

but  this  contradicts  the  previous  assumption. 

If  limit  points  are  not  isolated,  select  one  of  them,  x *,  and  construct  a  sequence  of  limit 
points  { Xk}  converging  to  x*.  From  the  previous  remarks,  as  all  limit  points  must  be  solution 
points. 

F{xk)  =  £(i*)  =  L{x*)  -  F(x*). 

Notice  that  all  solution  points  must  have  the  same  active  set,  from  strict  complementarity 
and  nonsingularity  of  the  Jacobian  at  all  limit  points,  implying  that  the  terms  XTc  are  zero 
in  all  cast's. 

Define 

-  * 

I  —  ^  ^  ^ 

'k~  \\xk~x*\\ 

and  select  a  convergent  subsequence  having  limit  point  d*.  From  the  Taylor  series  expansion 
for  the  active  constraints, 


e(xk)  =  0  =  c(x* )  +  A*dk\\xk  -  £*||  +  0(||x*  “  **l|2)' 
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which  implies  that  for  any  active  constraint  i, 

0  =  ajdk  +  0(| \xk  -  x*j|)  =>  a*Td*  =  0, 

and  d *  must  be  in  the  null  space  of  the  active  constraints  at  x*. 

For  the  Lagrangian  function  we  can  write 

V /,(**)  =  Vf(/)  +  V2F(/)(xi-x+)  +  o(||xfc-x+||). 

Using  the  property  that  all  points  considered  are  solutions  for  the  problem,  and  so  their 
Lagrangian  functions  have  zero  gradients, 

0  =  V2L{x*  )4  +  o(  1 )  =>  V'2L(x*  )d*  =  0, 

but  this  contradicts  assumption  A6,  and  the  sequence  must  have  a  unique  limit  point.  | 
Descent  properties 

As  a  consequence  of  Lemma  3.4.1,  to  prove  that  the  algorithm  is  globally  convergent  it  is 
enough  to  show  that  pk  —  0.  This  result  follows  from  the  boundedness  of  the  merit  function, 
and  the  fact  that  the  merit  function  decreases  by  an  amount  bounded  away  from  zero  by 
a  multiple  of  ||p/'  ||2  in  each  iteration.  The  first  step  along  this  line  of  reasoning  will  be  to 
establish  that  pk  satisfies  certain  descent  properties.  These  properties  can  be  considered  to 
be  related  to  the  well  known  condition  for  global  convergence  in  unconstrained  optimization, 
that  the  angle  between  the  gradient  and  the  search  direction  must  be  bounded  away  from 
orthogonality.  The  explicit  form  of  the  condition  to  be  used  is  given  (and  assumed  to  hold) 
in  the  next  paragraph. 

P3.  There  exist  constants  !,i\  >  0,  fa  >  0  such  that  the  incomplete  solution  for  the  QP 
subproblem,  pk,  satisfies 

ylpk  +  \ plfikPk  <  -AIM2  +  &IK-II. 


3.5.  The  penalty  parameter 

The  penalty  parameter  in  the  algorithm  is  modified  so  that  at  each  iteration  it  is  possible 
to  decrease  the  value  of  the  merit  function  by  a  sufficiently  large  amount.  Chapters  4,  5 
and  (i  include  proofs  for  the  following  property,  and  specific  definitions  for  the  value  of  the 
penalty  parameter  ensuring  that  the  desired  decrease  can  be  achieved. 
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P4.  There  exists  a  value  pk  such  that  for  some  positive  constant  f3„,  independent  of  the 
iteration, 

41(0. p)  <  ~iin\\Pk\\2 

for  till  p  >  pi,.. 

We  will  also  assume  that  the  sequence  {/>*}  is  nondecreasing. 

In  the  case  when  the  reduced  Hessian  is  indefinite,  a  slightly  different  condition,  also 
proved  in  Chapter  (i.  is  used;  in  the  modified  condition  <p'k( 0, />)  is  replaced  by  o"(0 ,f>).  'I'lie 
alterations  that  this  change  introduces  in  the  results  to  follow  will  not  be  discussed  here; 
they  are  studied  in  detail  in  Chapter  6. 

Whenever  p  is  mentioned  in  the  results  that  follow,  what  is  meant  is  not  the  actual  value 
of  the  penalty  parameter,  but  rather  the  value  of  the  bound  p  from  condition  P4.  All  the 
results  still  hold  if  this  value  is  replaced  by  a  bounded  multiple,  p  <  Kp,  for  some  /\  >  1. 
Also,  we  need  to  impose  a  condition  on  how  often  the  value  of  the  penalty  parameter  will 
be  updated.  It  will  be  assumed  that  there  exists  a  positive  constant  3 H  >  fiH  such  that,  no 
update  is  performed  whenever  o^. (()./>)  <  -3n\\l>k\\2- 

3.6.  Boundedness  of  the  steplength 

The  rest  of  the  global  convergence  proof  consists  in  showing  that  the  steplength  is  bounded 
away  from  zero,  and  so  the  potential  decrease  implied  by  the  bound  in  P4  and  (2.2.3)  is 
act ually  at tained. 

A  first  result,  whose  proof  depends  on  the  form  of  pk  amt  d„  introduced  in  the  following 
chapters,  where  it  will  be  justified,  gives  a  first  bound  for  the  rate  at  which  the  penalty 
parameter  is  allowed  to  increase  in  the  algorithm,  fighter  bounds  will  be  introduced  in 
subsequent  lent  mas. 

P5.  For  any  iteration  /q  in  which  the  value  of  p  is  modified, 

Pk,\\Pkt\ |J  <  A 

and 

/H-iIK-,  ~  W,li  <  iV 

for  '■nine  const  ant  A  . 
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I  lie  notation  kt  is  used  in  all  that  follows  to  indicate  iterations  at  which  the  value  of 
the  penalty  parameter  needs  to  he  modified. 

Wo  now  introduce  an  expression  for  o'(0)  that  will  be  used  extensively  in  the  proofs  of 
results  related  to  the  behavior  of  the  merit  function.  To  derive  it,  consider  first  the  gradient 
of  I.  A  with  respect  to  x.  A  and  s, 

1  f/(?)  -  A(x)T\  +  pA(x)I(^c(x)  -  sj  N 
V La(x,  A,  s)  =  -(c(.r)-.v)  .  (3.6.1) 

V  \-p(c(x)~. s)  ) 

It  follows  that  o'(0)  is  given  by 

o'(0)  =  pl<j  -  p1.  l7A  +  nl,lA\c  -  s)  -  (c  -  +  A  Tq  -  pi/r(c  -  .s) 

=  l>!(J  +  (2A  -  p)r(r  -  s)  -  /)|jc  -  s||2  (3.6.2) 

where  y,  A.  and  c  are  evaluated  at  x. 

The  following  results,  analogous  to  those  in  [GMSWKGb],  complete  the  proof  for  the 
boundedness  of  the  steplength.  These  results  start  by  proving  the  boundedness  of  certain 
quantities,  related  to  the  penalty  parameter,  that  appear  in  the  termination  conditions 
for  the  linesoarch;  these  results  provide  refined  bounds  for  the  rate  at  which  the  penalty 
parameter  may  increase  with  respect  to  the  ones  given  in  property  P5,  once  this  property 
is  assumed  to  hold.  In  all  these  results  it  must  be  remembered  that  there  exist  two  cases 
regarding  the  behavior  of  the  penalty  parameter  p.  It  may  remain  bounded  throughout  the 
algorithm,  in  which  case  the  results  follow  trivially,  or  it  may  need  to  be  increased  in  an 
infinite  number  of  iterations.  This  last  case  is  the  one  addressed  by  the  next  lemmas. 

Lemma  3.6.1.  For  all  iterations  kf  at  which  the  penalty  jKinimeter  has  to  be  modified. 

cllFkt  <  A'l|/>*,||2  +  ( 2 Ac-,  -  Pk,)T(ckt  -  Sk,), 
when  pic,  denotes  the  QP  multipliers  at  />j.( ,  and  K  is  a  positive  constant. 

Proof.  In  the  proof  we  drop  the  subscript  k[.  If  ||p||  >  c',  the  result  follows  from  the 
assumptions  and  the  boundedness  of  the  multiplier  estimate.  Otherwise,  from  Pi  the 
search  direction  must  have  been  obtained  as  a  solution  for  the  QP  subproblem,  implying 
that 


7  /  T 

<1  p  +  p  II P  =  -c'ft. 


(3.6.3) 
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Also,  if  p  denotes  the  value  of  the  parameter  before  being  modified, 

4>\p~)  >  -/3w||p||2,  (3.6,4) 

and  from  the  definition  of  <p' , 

c'p  <  -pTIIp  +  JhIIpII2  +  (c  ~  s)T(2A  -  p)  -  p~(c  -  s)T{c  -  s). 

From  the  non-negativity  of  p~ (c  —  s)T(c  -  .s)  and  the  boundedness  of  H  the  desired  result 
follows.  | 

Lemma  3.6.2.  There  exists  a  constant  M  such  that  for  all  /, 

Pk,  (c \(Pk,)  ~  <Ati+1  (Pk, ))  <  M.  (3.6.5) 

Proof.  To  simplify  notation  in  this  proof,  we  shall  use  the  subscripts  0  and  K  to  denote 
quantities  associated  with  iterations  and  /fc/+1  respectively.  Thus,  the  penalty  parameter 
is  increased  at  .r0  and  ,rA-  in  order  to  satisfy  condition  P4,  and  remains  fixed  at  po  for 

iterations  1 . A"  -  1 . 

From  the  definition  of  o. 

PuO  =  PaF  ~  Po*Tic  ~  -s)  T  \plic  -  s)T(c  -  5).  (3.6.6) 

Also,  property  P5  implies 

Polko  -  -Soil  <  -4/  and  ,pA-||cA-  -  sA-||  <  M. 

Sinn'  [|A||  is  bounded  (Lemma  2.1.1),  the  only  term  in  (3.6.6)  that  might  become  unbounded 
is  p0F.  The  desired  relation  (3.6.5)  then  follows  if  an  upper  bound  exists  for  po(  Fo  ~  Fi<)- 
Consider  iterations  for  which  ||po||  <  c',  so  that  property  Pi  applies  (for  all  other  itera¬ 
tions  p  is  bounded,  and  the  result  holds  from  assumption  A2).  In  this  case,  p0  is  obtained 
as  a  solution  for  the  QP  subproblem.  Let  po  denote  the  QP  multipliers  corresponding  to 
Pm 

Kxpanding  hf  about  .r(),  we  have 

/•/,•  -  I  'd  -  (xK  ~  xo)T Qa  +  0(  ||x0  -  Ta-||2).  (3.6.7) 

Similarly,  if  we  expand  rh  about  j-0.  we  obtain 

rK  =  c0  +  Aq{x,<  ~  xq)  +  0( ||x0  -  xK\\2).  (3.6  S) 


■  lioundt  dm  ss  of  tin  sttphixjlh 
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I  rom  l.i'iniii.i  3.  1. 1. 

li-'o  -  -Mi  <  -'^IIaiII  and  IK  -  •'■*11  <  -KIKII- 
ami  substituting  t  In*  expression  i/o  =  .k//o  —  //o/'ii  and  (3.(>.N)  in  (3.G.7).  wo  obtain 

t\>-  /•;,  =  (<’o  ~  t’/jVo  +  ^(niax(||pu||‘.  UPkII*))- 

\\ o  t  Inis  >ook  H)  bound 

/'ill  /-o  ~  /"  k  )  =  /'(Co  /'o  —  poclfio  +  A )0  ^max(  ||/'o||k  ||/'k  ||* ))  •  ( 3.G.9 ) 

lb  derive  a  bound  for  tlio  first  form  on  the  right-hand  side  of  (3.G.9).  Lemma  3.<i.l  can 
bo  used  to  write 

/'o' o/'u  <  /'oA'HaiI!2  +  Aj(Ai  -  ■\)),{2K  -  Aj)-  fd.b.lO) 

Because  />0jjr„  -  ■so||-  A)|l/'u||2-  ||A0||  and  j|/t0||  are  bounded,  from  (3. (>.10)  we  conclude 

t  hat 

Po'ol'o  <  A/.  » 3.0.11) 

Consider  now  the  second  term  on  the  right-hand  side  of  (3.0.9).  If  c^  denotes  the 
negative  parts  for  all  components  of  o,v.  from  /m  >  0  we  must  have 


and  from  1 2.1.1 )  we  have 


-  Aj'i/'o  <  /' u'-,//'u 


IcT-ll  <  IK  -  -k||- 


I  ii.fi.  12 ) 


(  sing  property  P5  and  the  relation  p{)  <  pK,  we  conclude  that 

-  aK/'<j  <  '/. 


(3.0.13) 


Finally,  consider  the  third  term  on  the  right-hand  side  of  (3. b.9).  It  follows  from  property 
P5  and  the  relation  pu  <  ph  that 


Aill/'oll2  <  -V  and  AiII/'kII"’  <  v- 


and  hence 


/»i)0(max(  |!ai||‘.||/';.  i|2))  <  M • 

Combitii  ng  '3.0.11).  ( 3.0.  1 3  I  and  ( 3.0. 1  I ).  we  obt  ain  the  bound 

/inf  /  ()  —  I  /,  )  <  3  M . 


1 3.<i.  1  1 ) 


which  implies  the  desired  result.  | 


■i.ti.  Bound)  dncss  of  the  strplingth 


Lemma  3.6.3.  linn  < lists  a  constant  M  such  that,  for  till  1. 

+  j  — 1 

I'k,  Y  Iku-W-!!2  <  M.  (3.6.15) 

k  =  k( 

Proof.  A-  in  the  previous  lemma,  wo  use  the  subscripts  0  and  A  to  denote  quantities 
associated  with  iterations  k\  and  A'/+1  respectively.  For  0  <  k  <  K  —  1.  property  (2.2.1a) 
imposed  by  t  he  choice  of  a^.  and  the  fact  that  the  penalty  parameter  is  not  increased,  imply 
t  hat 

Ok  —  Oi;+  ]  >  ~cr(\ko'k-  (3.6.16) 

We  can  me  t he  identity 

K-  1 

On  -  Ok  =  ^2(°k  ~  Ok+i)-  { 3.0. 17 ) 

k  =  0 

together  with  etpiations  (3.6.17),  (3.6.16)  and  property  P4  to  obtain 

K- 1 

\r*‘h<  Y  11/^1! 2  <  Ou  ~  oA  . 

k- 0 

Rearranging  this  expression  and  using  the  property  that  0  <  oy  <  1.  we  obtain 

/\-i 

\o-hi  Y  <  <?o  -  oK.  (3.6. IN) 

k= 0 

T In*  result  follows  by  multiplying  (3.6. IN)  by  po  and  using  Lemma  3.6.2.  | 

Lemma  3.6.4.  Tht  n  t  l  ists  n  constant  M  such  that,  for  all  k . 

(>k\\ck  ~  $k\\  <  U.  (3.6.19) 

Proof.  I  Ajun  i  lie  notation  of  the  two  previous  lemmas,  observe  that  (3.6.19)  is  immediate 
‘nan  property  P5  for  k  —  0  and  k  —  K . 

To  verify  a  bound  for  k  =  I . A’  —  1  (iterations  at  which  the  penalty  parameter  is 

not  increased),  we  first  consider  .rj.  f.et  unbarred  and  barred  quantities  denote  evaluation 
at  ./a,  and  r  i  respect  ive|y. 

If  c  >  A, /’/in.  then 


A 


■i.  b.  Bounildhit ns  of  the  stt  pit  nt/th 
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and  tho  bound  follows  from  Lemma  2.1.1. 

If  e,  <  A  J pH,  then  *,  =  0.  If  in  addition  e,  >  0,  then 

dole,  -  *,\  =  do  A  <  A, 

and  the  same  result  applies. 

Therefore,  assume  that  c,  <  0,  c,  <  X,/po.  and  expand  the  ?th  constraint  function 
around 

x,  =  r,  +  a0a.Jp  +  0(  ||o0du||2  )•  (3.6.20) 

Rewriting  the  previous  expression,  we  obtain: 

c,  =  c,  -  s,  =  ( 1  -  a0)c,  +  a0(«,V  +  <*,)  +  0(||o0/aj|!*  )•  (3.6.21 ) 

Adding  and  subtracting  (1  -  Oo).s,  on  the  right-hand  side  of  (3.6.21  )  gives 

r,  —  s',  —  ( 1  o0)(r,  —  s, )  -f  ( 1  —  oo)s,  +  +  c, )  4-  ()( i | c * || 2 ) -  (3.6.22) 

The  properties  of  n(J.  s,  and  aji>+  c,  imply  that 

(  1  —  e ‘ o ) -^ i  +  f*o(‘si  +  t/J  >  0. 

and  when  c,  <  min(0,  A,/do),  (3.6.22)  gives  the  following  inequality: 

do  I  A,  -  .s,|  <  do(  1  -  «o)ki'  -  -",1  +  />oO(||o0/>o||2).  (3.6.23) 

There  are  two  cases  to  consider  in  analyzing  (3.6.23).  First,  when  c,  >  0,  or  e,  >  A ,/pq. 
t lie  term  p\c,  -  s,|  is  bounded  above,  using  the  same  arguments  as  before.  The  second  term 
on  the  right-hand  side  of  (3.6.23)  is  bounded  above,  using  Lemma  3.6.3.  Thus,  the  desired 
bound 

do  Id  -s'iI  <  A/ 

follows  if  r,  >  min(0.  A,/do).  Extending  this  reasoning  to  the  sequence  A-  =  1 . I\  -  1. 

we  see  that  the  quantity  do|c,(xt)  -  s,(x*,)|  is  bounded  whenever  r, (x/J  >  min(().  X^Jpo)- 
or  r,(xc_i)  >  min(0, A(Jt_l)t //>0). 

Consequently.  the  only  remaining  case  involves  components  of  c  that  are  negative  and 
have  .s ,  =  ()  at  two  or  more  consectitive  iterations.  Let  c  denote  the  subvector  of  such 
components  of  r.  (’sing  the  componentwise  inequality  (3.6.23)  and  the  fact  that  0  <  o  <  1, 
we  have 


do  |  Id  J  t  )  -  s(  T  l  )||  £  do  I  |d  Xu)  —  -s(  Xo )  ||  +  Po(  H  |]o ,!,  - ). 
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It  we  proceed  over  the  relevant  sequence  of  iterations,  the  following  inequality  must  hold 
for  k  =  1 . /v  -  1: 

k-1 

/'uil'Hw)  -  *{?k)\\  <  />o||c( -To)  -  -5(a-o)||  +  PoO^Y^  llajPjl|2)-  (3.6.24 ) 

}=o 

The  result  then  follows  by  applying  property  P5  arid  Lemma  3.C.3  to  (3.6.24).  | 

The  next  two  lemmas  establish  the  existence  of  a  linesearch  step  bounded  away  from 
zero,  independent  of  k  and  the  size  of  p.  for  which  a  sufficient-decrease  condition  is  satisfied. 

Lemma  3.6.5.  I- or  0  <  6  <  n;., 

c-{0)<  -Oi(0)  +  .VM2. 

irln  n  X  i .-  a  constant  independent  of  k. 

Proof.  We  again  drop  the  subscript  k.  From  (3.6.1). 


f  V2  /•■  -  £,  (a,  +  p(c,  -  s, ))  V2c,  +  pArA 

1 

1 

s 

s 

V-'/.,  - 

-A 

0  / 

\  ~pA 

/  pi  y 

so  that 

o"(»)  =  prW(9)p  -  T.p{x(e)  -  s,(0))pTV2c,(6)p 

+  p(A{6)p  -  1 7)  (^A(B)p- q)  -  2f1(^A(6)p  -  qj.  (3.6.25) 

wit  ere 

U'(fl)  =  V2F(0)-£,(A, +^,)V2c,(0). 

W  e  now  derive  bounds  on  the  first  two  terms  on  the  right-hand  side  of  (3.6.25).  The 
first  term  is  bounded  in  magnitude  by  a  constant  multiple  of  ||/>))2  because  of  assumption 
A  2  and  the  boundedness  of  j)A)(  (from  Lemma  2.4.1).  For  the  second  term,  we  expand  c, 
in  a  1  aylor  series  about  x: 

r,l.r  +  dp)  =  c,(.r)  +  f ?a,(x)!p  -t-  ±02p,V2cl(s  4  6,p)p. 


where  u  <  ft  ■'  ft.  Since  s,{fi)  -  .s,  +  f)qt,  using  (2.2.2)  and  multiplying  by  p.  we  have 
/>(',(>  +  Up)  ~  (-S  4-  %))  =  p(  1  -  #)(r,(.r)  -  s, j  -f  p\d2prX'2rt(x  -f  6,p)p. 
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We  know  from  Lemma  3.6.4  that  p|c,(x)  -  .*,1  is  bounded,  and  Lemma  3.6.3  implies  that 
/>||n;>||2  is  bounded.  Therefore, 

p|(<\(*)-M*))|  <  -A,  (3.6.26) 

where  is  a  constant  independent  of  the  iteration.  Using  (3.6.26),  we  obtain  the  overall 
bound 

Y,  \p{^e)  -  s,(0))pTV2ct(6)p\  <  J\\p\\\  (3.6.27) 

t 

where  J  is  a  constant  independent  of  the  iteration. 

Now  we  examine  the  third  term  on  the  right-hand  side  of  (3.6.25).  Using  Taylor  series, 
we  have 

a,(x  +  0p)Tp  =  ajp  +  0pTV2c,(0t)p,  (3.6.28) 

where  0  <  0,  <  0.  Using  (2.2.2)  and  Lemmas  3.6.3  and  3.6.4,  we  obtain 

p(M0)p  ~  <?)  r(M0)P  ~  q)  <  P(c  -  s)T(c  -  s)  +  A||/>||2,  (3.6.29) 

where  L  is  a  constant  independent  of  the  iteration. 

From  (3.6.28)  and  the  boundedness  of  ||£||  (Lemma  2.4.1),  the  final  term  on  the  right- 
hand  side  of  (3.6.25)  can  be  written  as 

-  2 Zt(a(6)P  -  q)  <  2 ZT(c  -  s)  +  A/||p||2,  (3.6.30) 

where  M  is  a  constant  independent  of  the  iteration. 

We  now  observe  that 

p(c  -  s)T(c  -  s)  +  2£r(c  -  s)  =  -<t>'(0)  +  p!g  +  pT(c  -  s) 

=  -<?'(0)  +  p7(g  -  A‘p)  -  prs, 

and  using  Taylor  expansions  we  obtain 

All  -  -A)  =  pT(<)*  -  A*rp)  +  0(]|p||2)  =  pT  A*  T(  A*  -  /i)  +  0(||p||2). 

Condition  C8  on  the  multipliers  implies  that  there  exists  a  constant  M  >  0  such  that 

pl(g  ~  ATp)  <  A/||p||2. 

From  Pk  —  a*.  strict  complementarity  at  the  solution,  and  the  fact  that  the  correct  active 
set  is  identified  for  ||p||  small  enough  (property  Pi),  we  eventually  have  /<  >  0  and  //7x  >  0. 
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From  (3.6.29),  (3.6.30)  and  the  last  results,  we  have 

P(,\ (0)p-q)T(A(d)p-q)  -  2ZT(A(9)p  -  q)  <  -<*>'(0)  +  M'\\p\\2 .  (3.6.31) 

Combining  (3.6.27).  (3.6.29)  and  (3.6.31)  gives  the  required  result.  | 

Lemma  3.6.6.  The  linescarch  of  (he  algorithm  defines  a  step  length  «  £  (0,1]  such  that 

<t>(a)  -  0(0)  <  <7od>'(0)  (3.6.32) 

<nnl  a  >  n,  i Hurt  0  <  o  <  1  and  a  >  0  is  bounded  airay  from  zero  and  independent  of  the 
it(  ration. 

Proof.  If  condition  (2.2.3)  is  satisfied  at  a  given  iteration,  then  a  =  1  and  (3.6.32)  holds 
with  o  trivially  bounded  away  from  zero. 

Assume  that  (2.2.3)  does  not  hold  (i.e.,  o  is  computed  by  safeguarded  cubic  interpola¬ 
tion).  The  existence  of  a  step  length  a  that  satisfies  conditions  (2.2.4)  is  guaranteed  from 
standard  analysis  (see,  for  example,  More  and  Sorensen  [MSS  1] ).  We  need  to  show  that  o 
is  uniformly  bounded  away  from  zero.  There  are  two  cases  to  consider. 

From  the  assumption  that  (2.2.3)  does  not  hold,  d>(l)  -  <£(0)  >  <rd>'(0).  Since  d>'(0)  <  0, 
there  must  exist  at  least  one  positive  zero  of  the  function 

tl’(o)  =  d>(o)  -  d>(0)  -  oa4>'( 0). 

Let  n*  denote  the  smallest  such  zero.  Since  ib  vanishes  at  zero  and  a*,  and  ?/>'( 0)  <  0,  the 
mean-value  theorem  implies  the  existence  of  a  point  d  (0  <  d  <  o*)  such  that  0'(d)  =  0, 
i.e..  for  which 

<$'(«)  =  <r0'(0). 

Because  n  <  i/.  it  follows  that 

d>'(d)  -  0)  =  (a  -  ?])«/(())  >  0, 

Therefore,  since  the  function  o'(o)  —  1ld>'( 0)  is  negative  at  o  =  0  and  non-negative  at  d,  the 
mean- value  t  heorem  again  implies  the  existence  of  a  smallest  value  o  (0  <  a  <  d )  such  that 


=  r,d>'( 0). 


(3.6.33) 
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Tlu*  point  n  is  the  required  lower  bound  on  the  step  length  because  (3.0.33)  implies  that 
(2.2.11))  will  not  be  satisfied  for  any  o  £_  [(),o). 

Expanding  o'  in  a  I'aylor  series  gives 

o'(o)  =  o'(0)  +  no"(0), 

where  0  <  0  <  o.  Therefore,  using  (3.0.33)  and  noting  that  //  <  1  anil  o'(0)  <  0.  we  obtain 


-  _  oV>)-o'(0)  _  . ,  Jg>'(0)l 

°  ^  ^  WOrt) 


(3.0.31) 


o"(0)  '  "  p"(0)  ' 

(Since  a  >  U.  0  must  Ire  such  that  o”\S)  >  0).  We  seek  a  lower  bound  on  o,  and  hence  a., 
upper  bound  on  the  denominator  of  (3.0.31).  We  know  from  Lemma  3.0.’)  that  for  some 
positive  constant  .V 


o"(0)<  -o'(0)  +  A'HpII2  =  |o'(0)|  +  .V||/)||i 


implying 


Dividing  by  | o' ( 0 ) |  gives 


From  property  PI  it  follows  that 


(i  - 

lo'(0)|  +  -VIIpII2' 


( 1  -  n) 


(3.0.35) 


|o'(0)| 


|o'(0)|  > 


and  thus,  the  denominator  of  (3.0.35)  may  be  bounded  above  as  follows: 

,  +  :XMT  <  u2lML  =  ,  +  TF 

loTOI  -  +  i.)„||(,|p  +  ■ 

A  uniform  lower  bound  on  o  is  accordingly  given  by 

.  -f»(  1  ~  >l) 

a  *>  — - - 

-  ii„  +  2.V 


(3.6.30) 


satisfying  the  condition.  | 

From  these  results  global  convergence  follows,  as  given  by  the  following  property,  to  be 
proved  in  the  corresponding  chapters, 

P6.  For  the  sequence  generated  by  th(>  algorithm, 

litn  ||.rfc  -  x*  (|  =  0. 

X—’-TC’ 

where  .r*  is  a  solution  point  for  the  problem. 
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3.7.  Convergence  of  Lagrange  multiplier  estimates 

Once  the  global  convergence  of  the  algorithm  has  been  established,  the  next  step  is  to  show 
that  the  multiplier  estimate  At  also  converges  to  the  desired  value.  The  result  presented 
below,  given  as  Theorem  4.2  in  [GMSW86b],  implies  that  the  convergence  of  the  multiplier 
estimates  is  a  consequence  of  the  global  convergence  of  the  algorithm,  and  the  facts  that 
the  multiplier  estimates  are  bounded  in  norm,  and  the  steplength  is  bounded  away  from 
zero. 


Lemma  3.7.1.  Assume  that  P6  holds,  and  let  A*  denote  the  multiplier  vector  at  x*.  ,4>- 
,s time  also  that  time  exists  a  positive  value  a  stick  that  the  steplength  at  any  iteration  is 
bounded  away  from  zero:  o t  >  a  >  0.  Then 

lint  || At  -  A*  ||  =  0. 

k—o c. 


Proof.  From  (2.4.2), 


where 

7 kk  O  t . 


k 

*k+ 1  =  ~hklLr 

k 

7  jk  =  Ot'j  no-  «r)'  J  <  k, 

r=j  +  l 


(3.7.1) 


(3.7.2) 


with  Oq  =  1  and  o'  =  a},  j  >  1.  (This  convention  is  used  because  of  the  special  initial 
condition  that  Au  =  //q.)  From  the  boundedness  of  a  and  (3.7.2),  we  observe  that 


0  <  a  <  o'  <  1  for  all  j,  (3.7.3a) 

k 

X]  7 jk  =  U  (3.7.3b) 

j=  o 

7 jk  <  (1  -  a)h~ 3.  j  <  k.  (3.7.3c) 

From  condition  C8  on  the  multipliers  we  must  have 

Pk  =  A*  +  Mkdktk  (3.7.4) 

with  |A/t|  <  A/,  elk  —  ||.rt  —  x*||  and  ||<t!|  —  1.  From  property  P6,  A'i  can  be  chosen  so 
that,  for  k  >  h \ , 


Mkdk\  <  if. 


(3.7.5) 


_ 3.  7.  Convergence  of  Ldijm  utjt  mulliplit  r  cstimntt  s 

We  can  also  define  an  iteration  index  A’ 2  with  the  following  property: 

I  Jc  t 

'2(k  +  1 )( 1  +  /3nmu  +  ||  A*  || ) 

for  k  >  K 2  +  1,  where  /3nmu  is  an  upper  bound  on  \\fik\\  for  all  k.  Let  K 
Then,  from  (3.7.1)  and  (3.7.1),  we  have  for  k  >  2 A', 

h  k 

^k+ 1  -  Y.  +  Y  7jfc(A*  + 

j=0  j=K  +  l 

Hence  it  follows  from  (3.7.3b)  that 

K  k 

\k+i  -  A *  ~Y  iMth  -  x*  )  +  Y  I'jkMjdjh- 

J=0  t=h'  +  l 

From  the  bounds  on  ||/i;||  and  ||/j||  we  then  obtain 

h  k 

il'H+i  ~  A* ||  <  (3nmu  +  || A* ||)  +  Y 

j=0  j  =  I<  + 1 

Since  k  >  2  A’,  it  follows  from  (3.7.3a)  and  (3.7.3c)  that 
1\  l\  K 

£7, it-  <  £(1  -  d )k~J  <  D1  -  (^2K~3  ^  (K  +  D(1  -  d)K 

J=0  2=0  2=0 

Using  (3.7.6),  we  thus  obtain  the  following  bound  for  the  first  term  on  t he 
of  (3.7.7): 

K 

(3nmu  +  l|A+||)  Y  7jfc  <  JC- 

2=0 

To  bound  the  second  term  in  (3.7.7),  we  use  (3.7.3b)  and  (3.7.5): 

k  k 

Y  Vk\Mjdj\  <  Y  ^  \e- 

j  =  K  + 1  ]  =  K  +  \ 

Combining  (3.7.7)  (3.7.9),  we  obtain  the  following  result:  given  any  c  >  0, 
such  that 

\\Xk  -  A*  ||  <  e  for  k  >  2/7  +  1. 
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(3.7.6) 

m ax ( A’i,  h'2). 


(3.7.7) 

right-hand  side 

(3.7.8) 

(3.7.9) 
we  can  find  A 


which  implies  the  convergence  result.  | 
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3.8.  Unit  steplength 

As  mentioned  before,  the  determination  of  the  rate  of  convergence  for  the  algorithm  proceeds 
in  two  steps.  One  is  to  show  that  a  unit  steplength  is  always  accepted  for  ail  k  large  enough; 
the  basic  results  used  for  this  proof  are  introduced  in  this  section,  although  the  result  will 
be  proved  in  tin'  corresponding  chapters.  The  other  step  is  to  determine  the  convergence 
rate  of  the  sequence  {.r*  -f  pi  —  r*}.  This  will  be  done  in  Chapters  4,  5  and  6. 

The  following  lemmas  determine  the  limiting  behavior  of  cctain  subsequences  related 
to  the  penalty  parameter  p.  Again,  for  the  case  in  which  the  penalty  parameter  remains 
bounded  the1  results  follow  immediately,  so  their  interest  lies  in  the  case  when  p  is  assumed 
to  be  unbounded. 

The  first  result  is  an  extension  of  property  P5,  and  its  meaning  is  again  to  obtain  a 
better  bound  for  the  rate  at  which  the  penalty  parameter  may  increase,  once  we  know 
that  the  algorithm  is  globally  convergent.  As  before,  its  proof  is  left  to  the  corresponding 
chapters. 

P7.  F  or  iterations  k;  in  which  the  penalty  parameter  is  increased,  assuming  an  infinite 
sequence  of  such  iterations  exists, 

lim  /be, IK  H2  =  0 

/— oo 

and 

lirn  pkt\\ck,  ~  skl\\  =  0. 

/  — OO 

Other  results,  extensions  of  those  given  in  the  previous  sections,  and  providing  refine¬ 
ments  on  the  rate  of  increase  for  are  presented  in  the  next  lemmas. 

Lemma  3.8.1.  If  there  exists  an  infinite  subsequence  {A.-/},  then 

Pk,  (&-,  ( Pk, )  -  <£*,+  ,(  Pki ))  =  o. 

Proof.  We  use  the  same  notation  as  in  the  proof  of  Lemma  3.0.2.  From  the  boundedness 
of  ||A||  (Lemma  2.1.1),  and  the  fact  that  po  <  p,,-,  we  have 

Pol-AoU’o  _  •su)|  <■  2||A„||  Pol  I  Co  -  ^o  II  ~  0, 

Pul  A2  ( C/,  —  S/,-)!  <  2||A„||  pK||cA  —  .S/,||  —  0, 
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and  from  property  P7  we  have 

Po(Po  -  <P>< )  -  Po(  fb  -  F,, )  —  0.  (3.8.1 ) 

Using  (3.6.10 ), 

.  \  'I’  'T'  'f* 

Po/v'IIpoI!  +  Po(co  -  So)  (2A0  -  /to)  >  po^po  >  Po(co  -  s0)  po-  (3.8.2) 

Using  again  property  P7,  from  (3.8.2)  and  assumption  A3,  implying  the  boundedness  of 
l|Po||,  we  get 

PoCoPu-0.  (3.8.3) 

From  (2.1.1 )  and  (3.0.12)  (keeping  the  same  notation), 

-  Por/,Po  <  pocj/l‘0  <  Po||po|||kK  -  *'k||  —  0.  (3.8.4) 

For  the  last  term  in  (3.G.9),  we  can  again  use  property  P7  to  obtain 

PqO  (max(||p0||2i  ||Pa-||2))  —0.  (3.8.5) 

From  (3.8.1),  (3.8.3),  (3.8.4)  and  (3.8.5)  we  obtain 

Po(d>o  -  0k)  —  0, 

giving  the  desired  result.  | 

Lemma  3.8.2.  For  general  iterations  k, 

Jim  Pa-||pa||2  =  0. 

k~>  oo 

Proof.  If  p  is  bounded,  the  result  follows  from  property  P6  and  Lemma  3.4.2.  If  p  is 
increased  in  an  infinite  subsequence  of  iterations,  then  from  (3.6.18)  and  Lemma  3.b.6, 

K-\  2 

Pu  J2  II/4-H2  <  —r  Po(d>o  -  <P,<) 
and  the  result  follows  from  Lemma  3.8. 1.  | 

Lemma  3.8.3.  For  general  iterations  k, 
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Proof.  If  p  is  bounded  the  result  follows  from  e*  >  0,  A*  >  0,  A*rc*  =  0,  property  P6, 
Lemma  3.7. 1  and 

ct  -  s,  =  min(e,,  — ). 

P 

If  p  is  increased  in  an  infinite  subsequence  of  iterations,  consider  two  cases: 

(i)  If  f  is  such  that  c*  >  0,  then  A*  =  0  and  as 

p\c,  -  .s,|  =  |  minfpc,,  A,)|, 

from  the  convergence  of  the  multiplier  estimates,  eventually  p\ c,  -  st|  =  |A,|  —  0. 

(ii)  For  those  i  such  that  c*  =  0,  implying  A*  >  0,  consider  iteration  indices  large  enough 
so  that  the  correct  active  set  is  identified,  implying  afp  +  c,  =  0.  Then,  from  the 
Taylor  series  expansion  for  c  (3.6.20)  and  Lemma  3.6.6  (using  the  same  notation  as 
in  Lemma  3.6.1), 

c,  =  c,  +  a0a‘p+  O(\\a0p0\\2)  =  (1  -  a0)c,  +  O(||po||2). 

Recurring  this  relationship  for  the  kth  step  between  k  =  0  and  k  =  K  we  get 

tc-l  k- 1 

Pk  ck,  =  PoCk,  =  P<J  n^1  ~  +  A)C>(]C  lb;  II2), 

J=0  ;= 0 

but  as  0  <  (\j  <  1  wo  obtain 

k- 1 

PkVk,  I  <  Poko.l  +  PoO(J2  IIPjII2)-  (3.8.6) 

j= o 

From  property  P7  we  must  have  that  poko.l  —  0,  and  using  (3.8.6)  and  Lemma  3.8.2, 

PkVk,  I  ^  0. 

This  completes  the  proof.  | 

Another  relationship  that  will  be  needed  in  the  following  chapters  is  proved  in  the  next 
lemma. 

Lemma  3.8.4.  For  large  enough  k, 

pl'k  =  o. 
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Proof.  Assume  k  large  enough  so  that  the  correct  active  set  has  been  identified. 

(i)  If  i  is  such  that  c*  >  0,  from  condition  C9  on  the  multipliers,  fik,  =  0. 

(ii)  If  i  is  such  that  r*  =  0,  then,  from  strict  complementarity.  A*  >  0.  Also,  from 

Lemma  3.S.3,  Pk{('k,  -  )  =  minf/i^c^ ,  Aj-t )  — *■  0,  so  for  large  enough  k ,  Lemma  3.7.1 

will  imply  pkck,  <  A and 

s*.  =  max(o,ct  —  — — )  =  0. 
v  ’  pk  ’ 

proving  the  result.  | 

Using  the  previous  lemmas,  the  following  property  will  be  established  in  Chapters  4,  5 
and  (j: 

P8.  There  exists  an  iteration  index  k  such  that  for  all  indices  k  >  k  the  unit  steplength  is 
accepted:  o^.  =  1. 

'I  he  following  chapters  make  use  of  these  results  to  establish  the  rates  of  convergence  of 
the  corresponding  algorithms. 

3.9.  Boundedness  of  the  penalty  parameter 

The  main  consideration  in  the  definition  of  the  penalty  parameter  p  is  to  ensure  that  the 
directional  derivative  (or  the  curvature  along  the  linesearch)  is  sufficiently  negative.  This 
strategy  leaves  open  the  possibility  that  the  value  of  the  penalty  parameter  may  be  forced  to 
grow  without  bounds  to  satisfy  this  condition  as  the  algorithm  progresses.  Notice  that  for 
the  convergence  and  rate  of  convergence  proofs  the  boundedness  of  the  penalty  parameter 
is  irrelevant;  it  is  only  from  the  point  of  view  of  the  practical  behavior  of  the  algorithm  that 
we  may  want  to  have  p  bounded. 

This  section  presents  conditions  that  suffice  to  guarantee  that  the  penalty  parameter 
remains  bounded.  The  required  conditions  can  be  given  either  in  terms  of  the  properties  of 
the  multiplier  estimates,  or  in  terms  of  the  behavior  of  the  ratios  ||/>v||/||?>z!l  (or  both).  The 
study  of  the  sequence  of  ratios  for  quasi-Newton  methods  is  not  simple,  and  the  conditions 
presented  here  are  given  only  in  terms  of  the  properties  of  the  multipliers. 

The  following  lemma  proves  the  basic  result  concerning  the  behavior  of  the  penalty 
parameter.  The  notation  fit-  is  used  for  the  QP  multiplier  at  iteration  k. 
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Lemma  3.9.1.  ('onsidir  an  iteration  index  k  such  that  for  all  iterations  with  k  >  k  both 
profit  rt its  Pi  and  P8  hold.  If 

||2/u._,  -  pk  -  /ifc||  =  O(IIpjtJI), 

thin  thin  t  lists  a  Jiuitt  value  p  surli  that 

o'fc(°'P)  <  -^h||p(c||2 

for  all  k  >  k. 

Proof.  From  the  definition  of  o'.  (3.6.2),  and  the  fact  that  pk  >s  obtained  as  a  solution  for 
the  QP  sul>|>rol>lom.  we  have 

o'(0)  =  -p' Up  +  (2A  -  p  -  fi)r(c  -  s)  -  p's  -  p\\c  -  ,sj|2. 

Also,  from  the  correct  identification  of  the  active  set  and  property  P8. 


[’sing  Lemma  3.8.1  we  can  write 

o'(O)  =  ~prIlp  +  (2A  -  p  -  fife  -  />||r||2.  (3.9.1) 

where  c  now  demotes  a  vector  where  all  the  entries  corresponding  to  the  inactive  constraints 
are  zero. 

From  A)  py  =  -r  and  the  non-singularity  of  /IV  (assume  k  large  enough,  and  use 
assumption  A3 ).  t  here  must  exist  positive  constants  and  f) 2,  independent  of  the  iteration, 
such  that 

IMI  <  ‘^.ll/vll  and  HpvII  <  /J2|H(. 

The  arithmetic  mean/geometric  mean  inecpiality  implies  that  for  any  2/,  '/  >  0. 

yz  <  V  +  ~=2-  (3-n.i) 

2  2  7 

(  sing  this  result,  we  can  write  for  an  adecpiate  :i:i. 
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Also,  from  property  P8  and  the  assumption  on  the  form  of  ||2pyt_i  —  //*.  —  / 1  *  1 1  ■> 

C-'-V-  -  i‘k  -  Ok)1*-  <  aMIpIIIMI  <  /^IIpIIII/a  II  <  \pV/jTu %i>z  +  ihWps-W2- 

Combining  these  results,  we  obtain 


o\ 0 )  <  -±przZ‘/IZpz  +  Jrllprlr  -  <  -iplz'HZpz  -(p-  f37tf 

and  if  wo  select  p  >  JrJj,  the  desired  result  follows.  | 

Note  that  if  the  multiplier  estimate  is  such  that 

II/**-  a*  II  =  0(11**  +  /*-/ II). 


the  condition  in  Lemma  3.9.1  is  satisfied.  Lemma  2.1.3  establishes  this  property  for  the 
least-squares  multipliers  at  J'k  +  /At,  providing  an  example  of  a  multiplier  estimate  whose 
use  guarantees  the  boundedness  of  the  penalty  parameter. 


3.10.  Summary 

The  goal  of  this  chapter  has  been  to  present  the  structure  of  the  convergence  proofs  to  be 
completed  in  the  following  chapters,  and  to  establish  those  results  that  are  common  to  the 
proofs  for  the  different  algorithms.  The  steps  in  the  proofs  that  depend  on  the  specific 
implementation  of  the  different  algorithms  have  boon  left,  to  be  shown  in  the  corresponding 
chapters.  These  steps  are  collected  below  so  that  they  can  be  more  easily  referenced. 

The  next  chapters  prove  that  the  following  results  hold  for  the  corresponding  algorithms: 

Pi.  There  exists  a  value  ('  >  0  such  that  if  ||p*||  <  c',  then  the  correct  active  set  at 
a  solution  of  problem  NLP  has  been  identified,  and  pk  is  a  minimizer  for  the  QP 
subproblem. 

P2.  ||/At ||  =  0  if  and  only  if  Xk  is  a  solution  for  NLP. 

P3.  There  exist  constants  >  0,  i) 2  >  0  such  that  the  incomplete  solution  for  the  QP 
subproblem,  pk ,  satisfies 


fllPk  +  \plllkPk  <  --kllpQI2  +  ..*2||nt||. 


3.  tO.  Summary 


56 


P4.  There  exists  a  value  pk  such  that  for  some  positive  constant  (5„,  independent  of  the 
iteration. 

<  -/3«||p*H2 


for  all  p  >  ph- 

P5.  For  any  iteration  ki  in  which  the  value  of  p  is  modified, 


/>*,IK  II2  <  N 


and 

pfeilk/t,  -  «fc,ll  <  n 

for  some  constant  N. 

P6.  For  the  sequence  generated  by  the  algorithm, 

lim  \\xk  -  i*||  =  0, 

k — ►  oc 

where  x*  is  a  solution  point  for  the  problem. 

P7.  F  or  iterations  Aq  in  which  the  penalty  parameter  is  increased,  assuming  an  infinite 
sequence  of  such  iterations  exists, 

lim  p*(|K|!2  =  0 

/— *oo 

and 

lim  pfc,||cfci  -  sfc|||  =  0. 

I  ~ *oo 

P8.  There  exists  an  iteration  index  k  such  that  for  all  iteration  indices  k  >  k  a  unit 
steplength  is  accepted:  ak  =  1. 

The  theorems  where  the  corresponding  rates  of  convergence  are  established  will  also  be 
proved  in  Chapters  1,  5  and  6. 


Chapter  4 

Positive  Definite  Approximations 
to  the  Hessian 

4.1.  Introduction 

In  this  chapter  we  study  the  convergence  properties  of  an  SQP  algorithm,  defined  along  the 
lines  of  the  framework  algorithm  introduced  in  Chapter  2,  arid  such  that  //*  is  constructed 
to  be  positive  definite.  The  algorithm  is  very  similar  to  the  one  implemented  in  the  code 
NPSOL.  as  described  in  [GMSW86a],  with  the  difference  that  the  search  direction  in  a 
given  iteration  is  computed  as  an  "incomplete  solution"  for  the  quadratic  subproblem.  An 
incomplete  solution  in  this  chapter  will  be  a  feasible  point  for  the  subproblem  obtained 
according  to  the  rules  indicated  in  Chapter  2, 

The  goals  for  this  chapter  can  be  summarized  as  being 

•  the  derivation  of  a  global  convergence  proof  for  the  algorithm,  following  the  lines 
indicated  in  Chapter  3;  and 

•  the  identification  of  additional  conditions  that  need  to  be  imposed  to  attain  superlinear 
convergence,  and  the  proof  that  the  algorithm  achieves  this  rate  of  convergence. 

The  steps  needed  for  these  proofs  have  already  been  presented  in  Chapter  3.  where  those 
intermediate  results  that  are  independent  of  the  definition  of  //*,-  have  also  been  shown.  To 
complete  the  proofs,  this  chapter  need  only  establish  those  results  that  depend  on  the  form 
of  //*;,  properties  PI  P8. 
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4.2.  Definition  of  the  algorithm 

flu*  main  point  left  to  be  specified  in  the  description  of  the  framework  algorithm  in  Chapter 
2.  is  the  form  of  the  approximation  to  the  Hessian  of  the  Lagrangian  function,  II k-  The 
condition  on  //;■  that  is  assumed  to  hold  in  this  chapter,  and  that  should  be  added  to 
conditions  Cl  C9.  is: 

CIO.  The  matrices  //*  used  in  the  construction  of  the  QP  subproblems  are  positive  definite 
anti  bounded,  with  bounded  condition  number. 

1  iiis  assumption  is  identical  to  the  one  made  for  NPSQP.  In  practice,  such  a  sequence  may 
be  generated  (see  [CMSWSGa])  by  updating  a  quasi-Newton  approximation  to  the  Hessian 
of  the  Lagrangian  function  in  each  iteration. 

From  this  condition,  some  quantities  will  be  uniformly  bounded  in  the  algorithm.  The 
notation  introduced  below  is  used  throughout  the  chapter  for  these  bounds. 

■  h, H  is  an  upper  bound  for  the  largest  eigenvalue  of  //:  pTH  p  <  0ivii\\p\\2. 

■  KvH  is  a  positive  lower  bound  for  the  smallest  eigenvalue  of  //:  pTIip  >  /3svh\\p\\2- 

4.3.  Global  convergence  results 

The  results  in  this  section  establish  global  convergence  properties  for  the  SQP  algorithm 
under  study. 

The  first  step  in  the  proof  is  to  show  that,  from  assumptions  A1-A2,  condition  CIO, 
and  the  form  of  stop  (i)  in  the  solution  of  the  QP  subproblem,  the  norm  of  p  will  be 
uniformly  bounded  for  any  p  obtained  as  an  intermediate  step  during  the  solution  of  the 
QP  subproblem. 

From  the  condition  ||/?0||  <  /3p<.||c||  and  assumptions  A1-A2,  it  follows  that  ||;ro||  <  A’ 

and 

t  " (  7'0  )  Li  finmg  A  "L  A  —  A  . 

For  any  />.  <.•(/;)  <  A',  implying 

\(v  t  rr'c/)Tii(p  +  ir'g)  -  yTrr'g  <  A\ 

and  hence 

2A  / 3,vH  +  0„mg 


II P+II  ‘ffll 
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giving  the  bound 


o  _  Pnmg 
r-'nmp  —  q  i 


‘tkfiavH  +  finmg 

Pin 


Properties  of  the  search  direction 

The  next  result  is  the  one  presented  in  the  previous  chapter  as  property  PI,  that  is,  if  the 
norm  of  the  search  direct  on  in  arty  given  iteration  ||p/t||  is  small  enough,  then  the  correct 
active  set  must  have  been  identified. 

If  the  norm  of  the  stationary  point  where  the  search  direction  is  computed,  ||p*||,  is 
bounded  away  from  zero,  then  condition  C6  on  the  search  direction  implies  that  ||p;.||  is 
also  bounded  away  from  zero,  and  so  the  proof  of  Pi  needs  only  consider  iterations  where 
||/>t||  is  small. 

From  Lemma  3.3.1  we  know  that  if  this  norm  is  small,  we  must  be  close  to  a  stationary 
point  for  problem  NLP,  x,  and  in  that  case  we  can  use  the  results  from  Lemma  3.3.2  to 
bound  the  size  of  the  search  direction. 

Before  proving  our  first  lemma,  giving  a  bound  on  the  descent  from  the  stationary 
point,  we  introduce  bounds  for  several  quantities  that  are  related  to  the  descent  that  can 
be  achieved  in  the  QP  subproblem  at  x  when,  starting  from  the  origin,  a  step  of  the  form 
indicated  in  Section  2.3  is  taken. 

The  step  to  the  nearest  inactive  constraint  is  bounded  by 


Odj  (i  —  C  i  ^  h^spc  ^  ^  — 


I’ ho  step  described  in  condition  C3  is  bounded  by 


■  Li m  A  '^und 


q  >  tig  =  min^.J®, 


o  fidscflspm 


0lvH0l 


»  a  M  )  • 


Also,  the  following  bound  on  the  function  value  holds: 


(4.3.1) 


t ' ( O  )  T  2  P  —  0spd  —  ~20({sc0spm0g  • 


Since  we  only  have  approximations  to  the  second  derivatives,  we  cannot  guarantee  find¬ 
ing  a  direction  of  negative  curvature;  consequently,  we  can  only  prove  convergence  to  a 
first  order  KK  1  point.  Whenever  the  term  “solution  point”  is  used  it  the  following  para¬ 
graphs.  what  is  meant  is  a.  first-order  KK T  point  for  problem  NLP. 


.{ . :i.  Global  convergence  re suits 


60 


Tho  following-  lemma  uses  the  previous  bounds  to  obtain  a  lower  bound  on  the  descent 
available  from  p  at  a  point  that  is  sufficiently  close  to  a  stationary  point  tor  problem  NLP. 
It  must  be  remarked  that  only  properties  of  the  approximation  to  the  reduced  Hessian. 
Z1  ll  Z ,  are  used  in  the  proof,  and  so  the  result  still  holds  under  the  relaxed  assumptions 
introduced  in  the  next  chapter. 

Lemma  4.3.1.  Three  exists  a  value  3sf,r  >  0  such  that  for  any  stationary  point  x  not  a 
solution  of  problt  m  .XL!1,  and  any  point  x,  if  ||j;  —  i|j  <  3spr  and  p  is  the  search  direction 
obtained  from  a  stationary  point  for  the  QP  subproblcm  at  x ,  p.  having  the  same  active 
constraints  as  .}• .  thin  i  ithi  r 

v(p)  -  i .'(]>)  >  g l3spu , 

or  at  x  tin  Jacobian  for  the  active  constraints  is  singular. 


Proof.  We  consider  only  the  case  when  the  Jacobian  of  the  active  constraints  at  x  has  full 
rank. 

If  the  lemma  does  not  hold,  there  must  exist  a  stationary  point  x.  not  a  solution  for 
problem  NLP.  and  a  sequence  { j: ^. }  converging  to  i,  such  that  there  exists  an  associated 
sequence  {/);  }  of  stationary  points  for  the  QP  .subproblems  at  the  points  xk.  having  the 
same  active  constraints  as  x.  and  such  that 


Vk(Pk)  ~  i’’k(Pk)  <  iilspd 


for  all  k. 

We  show  first  that  ||/^j|  —  0.  Let  p*  denote  any  limit  point  for  the  sequence  of  QP 
stationary  points  (note  that  the  sequence  is  bounded).  From  the  assumption  that  the 
correct  active  set  has  been  identified,  it  must  hold  that  ]*y  =  0  (since  c  =  0  for  the  active 
const  raints ). 

Also,  from  Ippp  +  gx  =  .-1 J//*.,  selecting  any  convergent  sequence  for  //;.  and  using  the 
non  singularity  of  .  L-  for  large  k.  II*]*  =  0.  but  from  the  positive  definiteness  of  Zj.llf;Zk- 
it  must  hold  I  hat  p*7  =  0. 

From  this  result  it  must  hold  that 


h,Pk  +  rk. 
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and  for  largo  enough  A  (we  assume  that  the  correct  active  set  has  been  identified). 


min 

‘■'lT  Pk  +  Ck,  >0 


T  -  ,  „  ftspc 

«k,Pk  +  ck,  >  —  • 


In  addition  to  this,  if  fk  denotes  the  QP  multipliers  at  pk,  then  fk  —  ft  and  for  large 
enough  A,  if  ||p~  ||  /  0, 


max /It,  > 


3. 


spm 


A  bound  similar  to  the  one  in  the  previous  paragraphs  can  then  be  obtained  for  A  large 
enough,  as  follows.  The  step  to  the  nearest  inactive  constraint  can  be  bounded  by 
Define  i  ,k  —  Ak<lk  whenever  ||//~||  ^  0.  Then 


9k(lk  +  pjllkdk  =  ejjlk. 

Consequently.  for  large  enough  A, 


f'(0)  =  ((jk  +  ItkPk  fdk  <  ~3dsc^~- 

Hence  a  bound  for  the  step  to  the  minimizer  is  given  by  3°  =  \@g'  implying 

i'(Pk)  -  v(Pk  +  akdk)  >  l05pd- 
contradicting  the  hypothesis.  | 

In  the  statement  of  Lemmas  3.3.1  and  4.3.1  the  case  when  the  Jacobian  is  singular  has 
been  explicitly  considered.  In  the  next  results  we  make  use  of  assumption  A3  to  exclude 
this  case.  (The  possibility  of  having  a  rank-deficient  Jacobian  will  nor  be  examined.) 

We  shall  show  that  properties  PI  and  P2  hold  for  this  algorithm,  but  first  we  need  to 
introduce  some  notation. 


h"  denotes  the  value  of  b  associated  with  c  --  l3spr  in  Lemma  3.3.1.  If  ||/T||  <  b°  then  the 
condition  in  Lemma  1.3.1  is  satisfied. 


I  he  main  result  for  this  section  is  presented  in  the  next  lemma,  where  pk  denotes  the 
search  direction  obtained  as  an  incomplete  solution  for  the  QP  subproblem. 

Lemma  4.3.2.  Thin  txists  a  value  </  >  0  such  that  if  ||pf  ||  <  i'  thin  pk  is  a  minimizer 
for  tin  Ql’  subprobh  in  and  tin  cornet  artivi  set  at  a  solution  has  bun  identified. 

Also,  ||/g. |i  =  0  if  and  only  if  jk  is  a  first-order  KKT  point  for  problem  S Ll\ 
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Proof.  From  Lemma  4.3.1,  it  holds  that  if  ||p*jj  <  6°  and  pk  was  not  obtained  as  the 
minimizer  for  the  QP  subproblem,  then 


Tl’(Pk)  ~  HPk)  >  \P,pd 


and  from  the  continuity  of  there  exists  a  6  >  0  such  that  \\pk  —  P)t||  >  6. 
Define 

0\  =  min(^°,  “)• 


If  il/U-ll  <  then 

llPfell  >  \\pk  -  ptll  -  \\Pk\\  >~>0t- 

If  ||pfc!|  >  J'x-  then  from  condition  C6, 


llPfcll  > 


11 M 

A  sip 


> 


K_ 

$slp 


and  thus  in  all  cases  the  final  point  obtained  has  norm  bounded  away  from  zero. 

If  pk  is  obtained  from  ti.e  minimizer  of  the  QP  subproblem,  then  Lemma  3.3.1  can  be 
used  directly.  Assume  that  a  sequence  of  points  {i*}  exists  such  that  )J^.)]  — ►  0,  and  all  pk 
are  obtained  as  the  solutions  of  the  corresponding  QP  subproblems,  but  the  active  sets  do 
not  correspond  to  the  one  at  a  solution.  By  extracting  a  subsequence  having  fixed  active 
set  (there  are  only  a  finite  number  of  possible  active  sets)  and  taking  limits,  a  solution  for 
the  original  problem  with  that  active  set  is  obtained  (from  assumption  A6,  it  must  hold 
that  the  multiplier  vectors  converge  to  the  multipliers  at  the  limit  point),  contradicting  the 
hypothesis.  Hence,  a  lower  bound  for  \\pk\\  must  also  exist  in  this  case. 

For  the  second  part  of  the  lemma,  from  the  previous  remarks,  pk  =  0  if  and  only  if  pk 
is  a  solution  for  the  QP  subproblem.  Furthermore, 


Pk  =  0  is  a  solution  of  QP  <*=>  gk  =  Ajpk,  Pk  >  0,  ck  >  0,  plck-  =  0 

O  Xk  is  a  first-order  KKT  point  for  NLP,  (4.3.2) 


completing  the  proof.  | 

Descent  properties 

As  explained  in  Chapter  3,  we  need  to  impose  some  condition  on  the  direction  pk  to  ensure 
that  adequate  descent  can  be  obtained  in  each  iteration.  To  be  more  precise,  the  bound  on 
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t ho  directional  derivative  in  step  fiii)  of  the  algorithm  should  be  satisfied.  This  condition 
was  presented  in  the  previous  chapter  as  property  P3. 

The  next  lemma  shows  that  if  the  starting  point  for  the  QP  subproblem  is  selected  as 
indicated  in  Chapter  2.  the  search  direction  satisfies  property  P3.  Remember  that  r^  was 
the  quantity  introduced  in  Chapter  2  to  provide  a  bound  for  the  norm  of  the  initial  point 
Pk0'  and  that  its  most  relevant  property  for  the  proofs  that  follow  is  its  relationship  to 
ck  ~  sc,  given  in  (2. 2. a). 

Lemma  4.3.3.  Hun  exist  constant. s  Jj  >  0,  >  0.  and  initial  points  for  the.  QP  sub- 

probltm  that  girt  value  s  for  p^.  tin  search  direction,  satisfying 

PkfJk  +  2 <  ~A||Ptr||2  +  @2\\Tk\\-  (4.3.3) 

Proof.  In  the  proof  we  drop  the  subscript  corresponding  to  the  iteration  number.  Consider 
the  following  cases: 

(i)  p  is  obtained  as  the  solution  of  the  QP  subproblem.  Then,  for  some  fi  >  0. 

prg  +  p1  II p  =  p1  A Tii  =  -c7;:  <  -firc~  <  ||/7||||r-  || 

PFS  +  \pJHp  <  ~^p7llp  +  fimnu  ||c~  || , 

where  dumu  >  0  is  a  bound  on  the  norm  of  the  QP  multipliers.  Note  that  from 
condition  CIO,  pTllp  >  3Svii\\p\\2- 


(ii)  p  is  obtained  by  moving  from  a  stationary  point  p.  Different  cases  need  to  be  consid¬ 
ered  separately. 

•  Assume  that  ||p||  >  h°  and  ||/7  -  /c0||  <  \b°.  If  j|c||  <  cj  =  b°/(2fipc),  then  from  (2.2.6), 

IN  <  +  Ml  <  \t>°  +  M\c\\  <  s°, 

but  this  is  a  contradiction,  implying  that  under  this  condition  ||c]|  >  ti,  in  which  case 

iN<Nmp<^||c||  =  7v||c||. 

f  l 

Defining  -  3nm;l  +  T/,.//Tnmp,  we  have 

p'fj  +  /UP  <  iQM  <  ^A'||C||  <  iQKJnrnc 
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•  Assume  that  ||p-p0||  >  ^<5°.  ^  V.  denotes  the  objective  function  for  the  QP  subprob¬ 
lem  after  the  ith  QP  iteration,  \L\  =  gTPi  +  ^p^Z/pi,  we  can  write 

H'i- 1  -  i i'i  =  -a^g'd,  +  pj_ XH d, )  -  \ajdjHdi  =  df//d,-a,-(l  -  ^a;). 

Summing  over  all  the  iterations  to  the  stationary  point,  and  letting  t  -  gTp+  \pTHp, 

t'o  -  t  =  Y,,dJHd,  o,(  1  -  ±a,)  >  /3*t,//53j |!rf.-||2Oi(l  “  2Q')’ 


but  from  ||p  -  p0 1|  =  ||  £,a,d,j|  >  for  at  least  one  i  we  must  have 

"■h*11  s  h- 

where  in  is  a  bound  on  the  number  of  steps;  using  o,  <  1,  it  must  hold  that 


6° 


1  1 


C’o  -  C  >  3svh  (  .,  )  >  7  -  \PsvH  2m  ■ 


2m  I  \ot,  2 


6° 


(4.3.4) 


From 

to  =  PqQo  +  \p!}P Po  <  ^IIPoll  <  dpesfl 2IMI  (4.3.5) 

we  can  deiive  the  following  bound: 

PTg  +  \pTUp  <  t  <  to  ~  7  <  -PxM?  +  Ppcs0%\\T\\ 

for  0  <  Ji  <  7/ ‘ P’nmp’ 

•  If  ||p||  <  (A',  then  from  Lemma  1.3.1, 


00  V  ^  g  Pspd  i 


and  using  (1.3.5) 


1^6  +  \p‘  HP  't  s  fispd  +  .dpca/^2  llrll  ^  $1  IIpII  +  Ppcs  $2  1 1  ^ 


where  0  <  J,  <  ihv,t/(Zi%mp)-  I 
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Bounds  for  the  penalty  parameter 

We  now  show  t h at  the  penalty  parameter  can  be  selected  in  such  a  way  that  the  initial 
descent  available  for  the  linesearch  is  sufficiently  negative.  This  result  is  the  equivalent  to 
property  P4  in  Chapter  3,  although  in  this  case  (since  //*,  is  required  to  be  positive  definite 
from  condition  CIO)  it  seems  natural  to  define  the  constant  0H  in  terms  of  pjHkPk,  as  in 
the  next  lemma.  In  the  spirit  of  the  remarks  made  in  the  previous  chapter,  what  we  define  is 
a  bound  for  the  value  of  the  parameter;  the  actual  value  should  be  chosen  so  that  it  satisfies 
property  P4  and  is  bounded  by  a  finite  multiple  of  the  value  p  given  in  the  following  lemma. 

Lemma  4.3.4.  There  exists  a  value  pk  >  0  such  that 

4>'k(to,p)  <  -\pllhPk  (4.3.6) 

for  all  p  >  (>k- 

Proof.  Again,  we  drop  the  subscript  corresponding  to  the  iteration  number.  From  (3.6.2), 
the  condition  to  be  satisfied  can  be  written  as 

pr(j  +  (2A  -  p)T(c  -  s)  -  p{c  -  s)T(c  -  s)  <  -\pTIip. 

A  similar  but  stronger  condition  is 

-  bl\c  -  s)  +  32 vT( c  -  s)  +  (2A  -  p)r(c  -  s)  -  p[c  -  s)T(c  -  s)  <  0  (4.3.7) 

for  a  vector  b  uniformly  bounded  in  norm,  a  constant  02  >  0,  and  v,  =  sign(c,  -  st),  so  that 
r‘( c  —  .s)  =  ||c  -  .s;|| | .  These  parameters  must  satisfy 

PTfJ  +  \ pTHp  <  ~bT(c  -  s)  +  02 vT(c  -  s). 

The  following  paragraphs  introduce  specific  definitions  for  b  and 

Rearrangement  of  (4.3.7)  shows  that  a  sufficient  condition  for  p  is 

p(r  -  s)'(c  -  s)  >  (2A  -  p  -  b  +  0'2v)r(c  -  s).  (4.3.8) 

A  value  p  such  that  (4.3.8)  holds  for  all  p  >  p  is 

.  _  ||2A  -  p  -  b  +  0'2v 

Ik  -  *11 


(4.3.9) 
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The  value  f>  can  be  taken  as  (4.3.9)  if  <p'(0,p~)  >  —  A pTHp ,  where  p~  denotes  the  value 
of  the  penalty  parameter  at  the  previous  iteration;  and  as  any  value  greater  than  or  equal 
to  p~  otherwise.  | 

An  immediate  consequence  of  (4.3.6)  and  condition  CIO  is  the  satisfaction  of  property 

P4, 

<e>jt(0)  <  -\'3H\\Pk\\2  (4.3.10) 

for  J„  <  3svii. 

The  value  of  p  in  the  previous  lemma  has  been  given  in  terms  of  two  as  yet  undefined 
quantities,  b  and  3'2.  The  value  for  fi2  is  related  to  the  constant  introduced  in  property  P3, 
while  the  value  of  b  is  related  to  the  QP  multipliers  at  the  current  point.  For  the  purpose 
of  satisfying  property  P4.  ~b  can  be  taken  to  be  zero,  but  as  will  be  seen  later,  it  plays  an 
important  role  in  ensuring  that  the  penalty  parameter  is  chosen  in  a  way  that  does  not 
inhibit  superlinear  convergence.  The  following  paragraphs  offer  rules  for  the  definition  of 
those  two  quantities. 

The  conditions  that  b  needs  to  satisfy  to  allow  the  algorithn.  to  converge  superlinearly 

are: 

bk  -  A*, 

and  for  small  enough  j|pc-||, 

plok  +  b{(ck  -sk)<  -{ plllkpk ■ 

The  values  for  b  and  in  (4.3.9)  can  be  selected  as  follows: 

•  Define  /tk  as  the  QP  multipliers  if  pk  was  obtained  from  the  minimizer  for  the  QP 
subproblem;  otherwise  define  pk  as  a  multiplier  estimate  satisfying  conditions  C7-C9. 

•  Define 

j  _  \  P  if  Vr<J  +  PT(c  ~s)<  - prHp , 

I  p  otherwise. 

•  Define 

3  2  =  max(0,/32), 

1 1 c  -  s||i  =  p7g  +  \pTIlp  +  br(r  -  ,s). 


where 
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Note  that  J'>  is  bounded,  since  from  Lemma  4.3.3, 

p‘<J  +  \pTUp  +  bT(c  -  s)  <  pTg  +  pTIIp  +  bT{c  -  s)  <  (/d2  +  ||6||)||c  -  s 


The  strategy  for  the  selection  of  the  penalty  parameter  pk  is  to  define  its  value  to  satisfy 
property  P4,  while  remaining  small  enough  to  be  bounded  by  a  multiple  of  p.  An  example 
of  a  selection  rule  having  these  properties  is  as  follows. 

Let 


pk  =  I  Pk-l  if<?'(0 ,pfc_i)  <  -\vTkUkpk,  (4  3  n) 

I  max(/ifc.  2pk-i )  otherwise 

where  pk  is  defined  as  in  Lemma  4.3.4.  Then,  for  any  iteration  k,  in  which  the  parameter 
needs  to  be  increased,  it  holds  that  pkl  >  2pkl_  ,  and  the  penalty  parameter  goes  to  infinity 
if  and  only  if  its  value  is  increased  in  an  infinite  number  of  iterations. 


Proof  of  global  convergence 

In  order  to  prove  global  convergence,  we  need  to  establish  that  property  P5  holds.  The 
proof  of  global  convergence  relies  on  Lemmas  3.6.1  to  3.6.6  to  show  that  the  descent  in  each 
iteration  is  bounded  away  from  zero  by  a  large  enough  value,  and  on  the  boundedness  of 
the  merit  function.  The  next  lemma  show's  that  property  P5  holds  for  this  algorithm. 

Lemma  4.3.5.  For  any  iteration  in  which  the  value  of  p  is  modified, 

Pk,  IK  ||2  <  Ar 

and 

Pk,\\ck,  -  s-t,  ||  <  A\ 

for  some  constant  .V. 

Proof.  All  quantities  in  the  proof  refer  to  iteration  k(,  and  so  this  subscript  is  dropped. 

From  the  boundedness  of  if2 ,  Lemma  2.4.1,  the  definition  of  b ,  and  condition  C7  on  the 
multipliers,  there  must  exist  a  fixed  constant  ,V1  such  that 

||2A  —  p-  b  +  ftvW  <  Nr, 

and  from  the  definition  of  p  and  the  condition  that  p  has  to  be  selected  as  a  finite  multiple 
of  p. 


p||r  -  ,s||  <  A,. 
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For  tin1  second  part,  using  Lemma  1.3.3  (we  add  the  term  h7(c  —  s)  using  the  boundedness 
of  ]|h||),  we  tan  write  after  some  algebraic  manipulation 


o'(0)  =  i>‘g  +  (2A  -  nY(c  -  s)  -  p\\c  -  ,s||2 

<  - \pT1lp  -  ki||/>||2  +  (2 A  -  p  -  6  +  3 2 v)T[c  -  s)  -  />||c  -  s||2, 

and  if  we  have  o'(0)  >  — -jp7//p,  then 

•  klipll2  <  C2A  -  p  -  l  +  ihv^ic  -  s)  <  ||2A  -  p  -  b  +  ||c  -  s\\. 

We  reorder  terms  to  obtain 


k  -  *11  >  3, 


(-1.3.12) 


II2A  -  p  -  b  +  ihv ||  ’ 

Multiplying  both  sides  by  p  and  using  the  same  arguments  as  in  the  first  part  of  the 


lemma  vields 


p\\p\\ 2  <  * 


;  v  2  > 


completing  the  proof,  | 

We  ran  now  complete  the  proof  of  global  convergence. 

Theorem  4.3.1.  The  algorithm  described  in  this  chapter  has  the  property  that 

litn  ||pfc||  =  0 

tc— >90 


(4.3.13) 


Proof.  If  j|/n-||  =  0  for  any  finite  k ,  the  algorithm  terminates  and  the  theorem  is  true. 
Hence  we  assume  that  ||p*||  /  0  for  any  k. 

When  there  is  no  upper  bound  on  the  penalty  parameter,  the  uniform  lower  bound  on 
o  of  Lemma  3. (Lb  and  (3.0.15)  implies  that,  for  any  A  >  0,  we  can  find  an  iteration  index 
A  such  t  hat 

11/4-11  <  A  for  fc  >  A\ 

which  implies  that  ||pfc||  —  0  as  required. 

In  the  bounded  case,  we  know  that  there  exists  a  value  p  and  an  iteration  index  A  such 
that  p  -  p  for  all  h  >  A  .  We  consider  hnneforth  only  such  values  of  k. 

The  proof  is  by  contradiction.  We  assume  that  there  exists  c  >  0  and  an  infinite 
subsequence  {/,-,}  suit)  that  H/4,11  >  *  for  all  i.  Consider  only  indices  i  such  that  k,  >  A. 
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Lvery  iteration  alter  A  must  yield  a  strict  decrease  in  t lie  merit  function  because,  using 
Lemma  3. (Lb.  (  1.3.10)  and  the  fact  that  the  penalty  parameter  is  not  modified, 

o(u)  -  o(0)  <  (TrKp'tO)  <  -|(TQAi//||;)||2  <  0. 

llie  adjustiiHuit  of  the  slack  variables  ,s  in  step  (ii)of  the  algorithm  can  only  lead  to  a  further 
reduction  in  the  merit  function,  as  L  is  quadratic  in  .s  and  the  miuimizcr  with  respect  to  s, 
is  given  by  c,  -  A , / /» .  For  iterations  from  the  subsequence  we  have 

+  ,  )  -  o(xk)  <  o(j-ri  +  !)  -  0(xk)  < 

1  herefoiv.  since  the  merit  function  with  p  —  p  decreases  by  at  least  a  fixed  quantity  at 
('very  step  in  the  subsequence,  it  must  be  unbounded  below.  But  this  is  impossible,  from 
assumptions  Al,  A2  and  Lemma  2.1.1,  so  (1.3.13)  must  hold .  | 

Corollary  4.3.1 . 

lim  \\xk  -  i*||  -  0. 

fc— *'X- 

Proof,  fhe  result  follows  iininediately  from  Theorem  4.3.1  and  Lemma  3.1.1.  | 

A  second  corollary  establishes  tin1  convergence  for  the  multiplier  estimates. 

Corollary  4.3.2. 

lim  || Ac-  -  A* ||  =  0. 

k  —  x. 

Proof.  The  convergence  of  the  multiplier  estimate  is  a  consequence  of  Lemma  3.7.1.  given 
the  results  in  Lemma  3.0.0  and  Corollary  1.3.1.  | 

4.4.  Rate  of  convergence 

I  nder  suitable  additional  assumptions  it  is  possible  to  show  that  the  algorithm  converges 
at  a  superlinear  rale.  To  prove  this  result,  we  need  to  assume  that  Hk  converges  to  an 
adequate'  approximation  of  A(x*.  A* ).  the  Hessian  of  the  Lagrangian  function  at  the 
solution. 

In  the  following  results  the  symbol  IF.  defined  as  U  =  V^.L.  will  be  used  to  denote 
thi  Hessian  of  the  Lagrangian  function. 

I  he  conditions  that  we  impose,  in  add  it  ion  to  Cl  CIO.  arc': 


4-4-  Rate  of  convergence 


70 


Cll.  Following  Hoggs,  Tollo  and  Wang  [BTW82],  we  assume 

\\Zj(Hk  -  =  o(IK-ll). 

where  Zk.  a  basis  for  the  null  space  of  Ak,  is  bounded  in  norm  and  its  smallest  singular 
value  is  bounded  away  from  0. 

C12.  H/u  -  A* ||  =  o(\\xk  -  **((). 

This  is  not  the  only  set  of  conditions  under  which  it  is  possible  to  prove  that  the 
algorithm  converges  superlinearlv.  The  next  chapter  introduces  and  justifies  in  alternative 
set  of  conditions,  where  C12  is  replaced  by  the  requirement  that  the  j  alty  parameter 
must  be  cIiomui  large  enough  near  the  solution. 

The  proof  proceeds  by  showing  first  that  the  sequence  {x*  +  pk  —  x*}  converges  super- 
linearly,  and  then  proving  that  a  steplength  of  one  is  eventually  attained.  We  begin  by 
showing  that  property  P7  holds  for  this  algorithm. 

Lemma  4.4.1.  If  there  exist*  an  infinite  subsequence  of  iterations  {kf  at  which  the  penalty 
parameter  is  increased,  then 

!'»'  PkMPkM2  =  0 

/— oc 

and 

lim  Pfc,||c*  -  5*(i!  =  °- 

l— >oo 

Proof.  We  drop  the  subscript  k\  in  what  follows.  From  definition  (4.3.9)  and  boundedness 
of  the  ratio  p/p. 

p\\c  -  -s||  <  2||2A  -n-b  +  :f2v\\. 
and  from  the  dofi  nit  ion  of  b  after  Lemma  4.3.4, 

bkl  -  A*. 

As  the  QP  umlti|)liers  satisfy  pTg  -f-  pTHp  =  —  cTfi ,  arid  for  p  large  enough  p  is  obtained  as 
the  solution  of  the  QL  subproblem,  b  eventually  satisfies 

p'ff  +  f>r( r  -  s)  <  - pTIIp , 


implying  that  we  can  take  A'  =  0  in  (4.3.9). 
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From  Corollary  1.3.2  and  the  previous  remarks  we  have 

;hm  ||2At|  -  fikl  -  bkl  +  fl'2ki  rjt,||  =  0 
and 

Pkt\\ck,  -  Sk,\\  =  0. 

/— *X= 

We  can  now  use  (4.3.12)  to  get 

•'in  />fc,||/>fc1||2  =  0, 

I  — *oo 

completing  the  proof.  | 

We  want  to  show  that  condition  (2.2.3)  is  satisfied  for  all  /;  large  enough.  To  do  this, 
we  need  to  be  able  to  express  <?'(())  in  a  way  that  is  related  to  properties  of  the  algorithm 
already  established. 

We  start  by  defining  Tk  -  p7k(gk  -  Akpk)  +  pkWkPki  where  IT  is  the  Hessian  of  the 
Lagrangian  function  using;  \k  as  the  Lagrange  multiplier  estimate.  We  show  next  that  the 
satisfaction  of  (2.2.3)  is  directly  related  to  the  asymptotic  properties  of  Tk.  In  what  follows, 
the  absence  of  an  argument  indicates  values  at  xk,  and  an  argument  of  0  will  indicate  values 
at  xk  +  0pk.  for  any  fixed  0  €  [0,  1]. 

Lemma  4.4.2.  The  following  relationships  hold: 

ok(e)  -  o,( o)  =  0(i  -  ±0K(O)  +  ±92Tk  +  o(\\pk\\2) 

and 

o'k(0)  -  (1  -  0)4(0)  +r/’fc  +  o(||w.||2). 


Proof.  1  rom  (2.2. 1 )  we  have 

O(0)  -  o  -  1(0)  -  I  -  ^A  +  0(p  -  A))  (c{6)  -  s  -  6g^j  +  A !(c  -  s) 

+  }/?(<"(#)  -  s  -  Og'j  ( c(9 )  -  .s'  -  Of/j  -  \p(c  -  s )T(c  -  .s), 

and  using  the  corresponding  Taylor  expansions  around  xk , 


<■,(0)  -  .s,  -  Og,  =  (  1  --  0)(c,  -  .s.)  4-  Ttf2/ /V2r,p  +  "(||p||2). 
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0(9)  -o  =  OyTp  +  ±02prY2Fp  -  (i  -  9)Xr(c  -  s)  -  9{  1  -  0)fT(c  -  s) 

-  L/T,Xt  prxr2c,p  -  /V2c,-p  +  Ar(c  -  .s) 

+  1/^(1  -  Of(c  -  s  )T(c  -  s)  +  ^(1  -  0)02£,(<-.  -  Si)pTV2CiP 
+  -  -s)7(c  - .«)  +  o(|Hj2). 

From  I.emmas  1.1.1.  3A2.  3.8.3  and  3.8.4, 

O{0)  -  O  =  ft  o'  +  A02(/,7U'/>  +  2£7(c  -  s)  +  />(<■  -  •>i)/(c  -  E)  +  o(||/>||2) 

=  0(  1  -  tfW  +  \V2(prWp  +  pJg  +  fj(c  -  *))  +  o(||pj|2) 

=  ^(1  -  ^)o'  4-  Atf2(p7U>  + />7(p  -  .d7/i))  +  o(||p||2). 

l  or  the  second  result,  from  (3.(1. 1 ). 

o'(0)  =  //'}/( 0)  -  /.-1(0)7‘(a  +  0(g  -  A))  +  ppTA(9)r(c\0)  -  ,s  -  9q) 

-  2>  (c(9)  -  s  -  0r/)  +  r/7(A  +  0(//  -  A))  -  /v/T(c(0)  -  $  -  (ty). 

and  again  using  the  corresponding  Taylor  series  expansions  we  obtain 

o'l  d)  p'g  +  0prY2  Fp  -  p7 . 4 7 A  -  0pT,iT£  -  9'£lX,  pTV7c,p 

-  ^£,£<  /^Vv.p  +  />(!-  9)p,A1(c-  .s)  +  7/^‘:£1('i7/>)p7V2c1;j 

+  1  -  0)£,(C  -  s,)pJY2c,p+  ^p03y,{p1Y2c,p)2 

...  ( 1  _  tf)cV  _  _  ^2E,sf,  /V2c,p  +  r/7A  +  0r/7£ 

-  /'(I  -  <9)c//(r  -  .s)  -  ^02£1<7,p7V2c1p  +  o( ||/>||2 ). 

from  l.eintnas  1.1.1.  3.8.2.  3.8.3  and  3. 4.1  we  finally  get 

.>'!  dj  _  o'  +  (//iFp  +  2£  V  -  •s)  F  p(c  -  v>  V  ~  .s)j  +  o(||/, ||2) 

-  (  1  -  0)o'  +  ft(j)]\Yp  +  /{g  -  ,\rg))  +  o(||p||2), 

complet  ing  i  he  results.  | 

I  he  following  results  make  use  of  I  he  relationships  introduced  in  this  lemma  only  f< 
t  he  part  icula r  ease  0  ] . 

Condition  Cl  1  implies  the  supeilinear  convergence  of  the  sequence  |.r^.  +  pj.  -  .r*  },  ; 


t  he  next  lemma  shows. 
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Lemma  4.4.3.  If  condition  Cll  holds,  then 

Ik*.-  +  i>k  -  f ||  =  «(|kfe  -  **||)-  (4.1.1) 

Proof.  Assume  k  to  be  large  enough  (hat  pk  is  obtained  as  the  solution  of  the  QP  sub- 
problem.  and  the  correct  active  set  has  been  identified. 

In  what  follows,  all  values  refer  to  iteration  k,  except  those  corresponding  to  the  solution. 
Consider  first  the  decomposition  of  x  +  p  —  x*  into  null-space  and  range-space  components: 

x  —  x*  =  Zu  +  Y  v. 

For  the  range-space  component  we  make  use  of  the  series  expansion,  restricted  to  the 
active  const raiuts  at  x: 

0  =  <•*  =  c  +  .4 ( x*  -  x)  +  o{ ||.r  -  ;r*||). 

From  Ap  =  —c  and  the  previous  decom position, 

.-FFn  =  o(||j  -  .r*  1 1 ) . 

anil  from  assumption  A3. 

e  =  o(||t  -  a-*  ||). 

For  the  null-space  component,  consider  the  corresponding  Taylor  series  expansions 
around  x: 

=  if  =  g+  V2  F[x*  -  x)  +  o((jz  -  J*||), 

•4* 7 A*  =  At X*  +  EiA?V2c,(x*  -  x)  +  o(||;r  -  **||). 

Combining  these  two  results  and  denoting  the  Hessian  of  the  Lagrangian  function  by  IF, 

~  x*  )  +  .  1 7A+  -  ij  -f  ffX,  -  A*)V2r,(.r  -  x*  )  +  o(||.r  -  r*||). 

from  Corollary  1.3.2  and  !!]>+()  =  AJp, 

IF(r  +  p  —  /  )  +  -/i)  =  (//-  \V)p  +  ov||a-  -  /||). 

Fsing  the  decomposition  of  x  +  p  -  x*  into  null-space  and  range-space  components,  the 
previous  result  gives 


Z.'WZn  =  Z'\ll  -  IF )p  -  Z’WYv  +  o(||.r  -  j*||). 
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and  from  the  properties  of  v .  condition  Cll  and  the  nonsingularity  of  Z^W  Z  near  t lie 
solution, 

u  =  o(!|x  -  T*\\), 

completing  t lie  proof.  | 

The  main  result  of  this  section  is  given  in  the  next  theorem,  where  it  is  shown  that 
after  a  finite  number  of  iterations  a  steplength  of  one  is  taken  for  all  iterations  thereafter, 
implying  that  the  algorithm  achieves  superlinear  convergence. 

Theorem  4.4.1.  Curler  the  previous  conditions,  the  algorithm  converges  supcrliinarly. 

Proof.  As  in  Powell  and  Yuan  [PYS6],  observe  that  the  continuity  of  second  derivatives 

gives  the  following  relationships: 

/•'(*  +  />)=  F(x)  +  ^ (<7(r )  +  <j(x  +  />))  p  +  c>( || 7>|| 2 ) 

<-{x  +  p)  =  c(x)  +  +  A(x  +  p)jp  +  o(||p||2). 

From  the  Taylor  series  expansions  we  have 

/•’(•>•  +  />)  =  F(-r)  +  g{x)rp+  L2prV2F(r)p  +  o(||p||2) 
c,{x  +  p)  =  c,(x)  +  at{x)Tp  +  lp7V2c,(x)/>  +  o(||/>|l2). 

and  since  (1.1.1)  implies  g(.r  +  p)  =  g*  +  o(||p||).  a,(x  +  ]>)  =  a*  +  o(||p||),  we  get 

pTV2Cp  =  (g*  -  g)Tp  +  o(\\p\\2) 
p1V2ctp  =((!*-  n,)Tp  +  o(  ||p||2 ). 

(liven  t  Imt  Y.,  A,  p‘\'2c,p  =  Y,  Mi  prV2c,7>  +  o(||p||2 ).  we  must  have 

p'\Vp  -  p7(g *  -  A*‘p)  -  pr(g  -  A’p)  +  o(||/;||2).  (  F1.2) 
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but  from  (1.3.10)  condition  (2.2.3)  is  eventually  satisfied,  and  we  have  xk+l  =  xk  +  Pk  for 
all  k  large  enough.  In  this  case,  from  (4.4.1), 


lim 

fc— •  "XT- 


jj£*+i  -  j*II 

I  kit  -  x*  II 


=  o, 


i.e.  supeilinear  convergence,  completing  the  proof.  | 


4.5.  Summary 

In  this  chapter  we  have  introduced  and  analyzed  an  algorithm  that  is  based  on  the  framework 
algorithm  of  Chapter  2.  It  uses  a  positive  definite  approximation  to  the  full  Hessian  of  the 
Lagrangian  function,  and  an  incomplete  solution  for  the  QP  subproblems.  The  study  of  the 
convergence  properties  of  this  algorithm  has  produced  the  following  results: 


•  When  the  search  direction  and  t lie  multiplier  estimate  are  defined  satisfying  conditions 
Cl  C9.  and  the  Hessian  approximation  II k  satisfies  condition  CIO,  the  algorithm  is 
globally  convergent. 

•  'flic  algorithm  converges  sujx rlintarly  if  the  following  conditions  are  satisfied: 

Cll.  || Zk[Hk  -  Wk)pk\\  =  o(||pr-||),  where  Zk,  a  basis  for  the  null  space  of  Ak ,  is 
bounded  in  norm  and  its  smallest  singular  value  is  bounded  away  from  0,  and 

Cl 2.  ||/U.  -  A* ||  =  o(\\xk  -  ;r*||). 

In  th<’  chapter  that  follows,  we  will  show  superlinear  convergence  for  this  algorithm 
under  condition  Cll  and  an  alternative  to  C12: 

C12’.  When  the  iterates  are  close  to  the  solution,  the  penalty  parameter  is  chosen  to  be 
large  enough. 


Chapter  5 


Approximations  to  the  Reduced 
Hessian 


5.1.  Introduction 

This  chapter  considers  an  algorithm  similar  to  the  one  presented  in  Chapter  4,  with  the 
difference  that  conditions  CIO  and  Cll  are  relaxed.  We  shall  now  only  impose  conditions 
on  the  approximation  to  the  reduced  Hessian  (but  not  on  the  full  Hessian  approximation). 

There  are  three  main  reasons  to  consider  relaxing  our  requirements.  From  the  second- 
order  optimality  conditions,  only  the  reduced  Hessian  can  be  expected  to  be  positive 
semidefinite  at  a  solution  of  the  problem,  and  so  it  seems  unreasonable  to  attempt  to 
approximate  the  full  Hessian  by  a  matrix  that  is  required  to  be  positive  definite.  We  may 
wish  instead  to  impose  positive  definiteness  only  on  the  approximation  to  the  reduced  Hes¬ 
sian.  Secondly,  the  size  of  the  reduced  Hessian  is  usually  smaller  than  that  of  the  full 
Hessian,  and  in  many  cases  the  difference  in  size  is  significant.  For  large-scale  problems, 
approximating  the  full  Hessian  is  problematic,  whereas  approximating  the  reduced  Hessian 
can  be  straightforward.  Finally,  it  is  not  known  in  general  how  to  construct  matrices  II ^ 
that  satisfy  conditions  CIO  and  Cll,  but  on  the  other  hand,  it  is  not  too  difficult  to  enforce 
satisfactory  conditions  on  the  asymptotic  properties  of  the  reduced  Hessian  approximation. 

The  conditions  that  replace  CIO  Cll  take  the  form: 

CIO’.  IIk  is  uniformly  bounded,  and  Z^/f^Z/t  is  positive  definite  with  smallest  singular 
value  bounded  away  from  zero,  where  Z*.  is  a  basis  for  the  null  space  of  the  active 
constraints  at  the  initial  point  for  the  QP  subproblem  at  Xk . 
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Cll\  /.'•  it 11  k  )Zkj>Zlt .||  =  "( ||p*|| ),  whore  \\\  denotes  the  Hessian  of  tin*  Fagrangian 
function  at  ,ik. 

Tlu*  definition  of  the  reduced  Hessian  requires  the  specification  of  a  set  of  active  con¬ 
straints.  Crucial  to  the  issues  presented  in  this  chapter  is  the  notion  that  at  each  iteration 
tin  initial  "active  set"  of  constraints,  whose  characteristics  will  be  specified  later,  is  selected 
prio;  to  attempting  to  solve  the  Q!‘  subproblem.  Condition  CIO’  makes  use  of  this  as¬ 
sumption  when  imposing  conditions  on  the  reduced  Hessian  approximation.  From  iteration 
to  iteration  this  active  set  may  change,  and  this  requires  the  definition  of  a  strategy  to 
cope  with  the  changing  size  of  the  reduced  Hessian  approximation.  Fortunately,  this  is  not 
aii  issue  in  the  limit,  provided  we  can  show  convergence,  since  any  reasonable  definition  of 
the  initial  active  set  for  tin*  QP  subproblem  will  eventually  remain  unaltered  for  successive 
nonlinear  Herat  ions. 

Conditions  CIO’  and  Cll’  apply  only  to  the  reduced  Hessian  approximation,  and  the 
convergence  proofs  presented  in  this  chapter  impose  no  requirements  on  the  matrices  llk) -k. 
It  seems  reasonable  then  to  ask  what  is  the  role  of  these  matrices,  if  any,  in  the  algorithm 
considered.  I  he  answer  is  that  Zjl/kYk  is  needed  for  the  computation  of  the  null-spare 
component  ol  the  search  direction  pZk,  and  )k  Hk\ \  is  used  to  obtain  the  QP  multipliers. 
If  our  main  concern  is  to  define  an  algorithm  able  to  deal  with  large-scale  problems,  we 
may  take  advantage  of  the  freedom  we  have  in  the  definition  of  these  matrices,  and  select 
them  so  that  the  computations  in  which  they  appear  become  as  simple  as  possible.  A 
common  choice  has  been  to  take  Zk  II  kVk  equal  to  zero  and  Vj // k Yk  to  be  a  well-behaved 
positive  definite  matrix,  for  example  the  identity.  With  these  choices  and  condition  CIO’, 
it  U  clear  that  CIO  is  automatically  satisfied,  and  the  proofs  in  Chapter  1  only  need  to 
be  modified  wherever  they  make  use  of  Cll,  that  is,  for  the  purpose  of  establishing  ‘he 
tale  of  convergence  of  the  algorithm.  (In  this  setting  Cll  can  no  longer  be  expected  to  be 
satisfied.)  The  modified  proof  using  Cll’  is  given  at  the  end  of  the  chapter. 

Die  preceding  paragraph  considers  only  a  particular  set  of  options  for  the  definition  of 
Ilk.  A  more  general  approach  to  the  problem  would  bo  to  define  an  algorithm  with  similar 
convergence  i  -operties,  but  requiring  only  condition  CIO’,  instead  of  CIO.  This  situation 
arises  if  for  a  program  of  moderate  size  we  are  approximating  the  whole  matrix  llk ,  but  wo 
only  require  Z[llkZk  to  be  positive  definite.  Constructing  llk  in  this  way  would  allow  us 
to  achieve  b**i  t «*r  rales  of  convergence  than  the  ones  attainable  when  we  only  approximate* 
t  he  I  od need  lle.sMail. 
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One  case  that  this  approach  would  cover  is  the  use  of  one  of  the  recently  proposed 
quasi- Newton  updates  that  preserve  only  the  positive  definiteness  of  the  reduced  Hessian 
approximation  (see  for  example  [Fen87]). 

The  chapter  proves  global  convergence  for  an  algorithm  that  assumes  only  that  CIO’ 
hoids.  Again,  note  that  for  particular  definitions  of  Hk  that  satisfy  condition  CIO,  like 
the  oik'1  indicated  above,  the  global  convergence  proof  in  Chapter  4  is  immediately  applica¬ 
ble.  The  chapter  ends  with,  a  proof  for  the  rate  of  convergence  of  the  algorithm  when  the 
approximation  to  the  Hessian  is  required  to  satisfy  the  relaxed  convergence  condition  Cll’. 

5.2.  Global  convergence  results 

We  begin  by  introducing  some  notation  for  this  chapter.  Let  Zk-  as  above,  be  a  basis  for 
the  null  space  of  Ak ,  the  Jacobian  corresponding  to  the  constraints  active  at  the  initial 
point  pka.  for  the  QP  subproblem  at  ik ■  Let  Ct  denote  the  value  of  the  constraints  in  this 
set  at  the  current  point,  and  \\  a  basis  for  the  range  space  of  A The  vectors  pz  and  pY 
are  used  to  denote  the  components  for  p  in  some  null-space  and  range-space  decomposition, 
respectively:  the  specific  decomposition  will  in  general  be  clear  from  the  basis  matrices  used 
in  the  corresponding  expressions.  Finally,  wc  <  0  is  a  vector  such  that  Ap  =  — (c  +  u’c). 
and  we  extend  if  to  a  full  m-dimensional  vector  by  adding  zero  entries  corresponding  to  the 
inactive  constraints  at  the  initial  point. 

Under  condition  CIO’.  p7kHkPk  rn ay  take  negative  values,  in  which  case  &,,//  <  0.  On 
the  other  hand,  this  cannot  happen  for  vectors  in  the  null  space  of  Ak-  We  therefore  use 
the  following  constant: 

T.-//  is  a  positive  lower  bound  for  the  smallest  eigenvalue  of  Hk  on  the  subspace  spanned 
by  Zk-  py/l Hk&kPz  >  3szIl\\ZkPz\\2- 

Properties  PI  and  P2  still  hold  under  the  new  conditions.  They  may  be  proved  using 
arguments  similar  to  the  ones  presented  in  Chapter  4,  with  only  a  minor  modification 
introduced  in  Lemma  5.2. 1 .  The  main  change  to  be  made  to  the  algorithm  given  in  Chapter 
I  is  the  introduction  of  a  new  bound  for  the  directional  derivative  of  the  merit  function. 
In  Chapter  1  the  bound  was  given  as  ~\p[lf kPk-  but  under  the  relaxed  assumptions  on 
Hk  this  quantity  may  not  be  positive  in  all  iterations.  The  new  bound  should  preserve  the 
property  that  the  directional  derivative  is  bounded  away  from  zero  by  a  quantity  related 
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to  A  reasonable  choice  is  to  use  a  linear  combination  of  p7Zl  HZ  and  ||c||^  to  form 

t  lie  boil  ml . 

A  second  change  A  i  he  definition  of  pk,  to  take  into  account  our  lack  of  knowledge  about 
the  properties  of  llk  outside  the  null  space  of  the  “active”  constraints.  In  Chapter  1  the 
search  direction  was  obtained  from  the  QP  stationary  point  by  taking  a  descent  step  with 
respect  to  the  QP  objective  function.  In  this  section  the  stop  from  the  stationary  point  is 
computed  in  terms  of  the  value  of  the  descent  available  for  the  linesearch,  as  this  function  in 
general  litis  better  properties  (convexity)  than  the  QP  objective  function.  A  more  general 
approach  is  presented  in  a  slightly  different  setting  in  Chapter  fi. 

Definition  of  the  search  direction 

As  menl  ioned  above,  we  modify  slight iy  t  he  way  t  he  incomplete  solution  p^  is  obtained  from 
th(‘  QP  subproblem,  with  respect  to  the  conditions  given  in  Chapter  2. 

I’ll e  value  of  pk  is  now  obtained  by  moving  to  the  first  stationary  point  for  the  QP 
subproblem  found  by  the  algorithm.  />*,  and  from  there,  if  the  stationary  point  is  not  a 
minimi/.er  for  the  QP  subproblem,  by  taking  a  step  along  a  desient  direction.  To  proceed 
further  doe,-,  not  seem  worthwhile.  Since  only  an  approximation  to  a  particular  reduced 
Hessian  is  known,  it  becomes  necessary  to  define  artificially  the  curvature  in  an  enlarger! 
space,  when  any  constraints  are  removed  from  the  active  set.  If  we  have  an  approximation 
to  tin1  full  Hessian,  and  the  properties  of  the*  approximation  outside  the  current  subspace 
are  not  controlled,  the  search  directions  computed  may  be  unacceptable  unless  special  pre¬ 
cautions  are  taken.  In  Chapter  (i  we  introduce  conditions  that  would  allow  us  to  prevent 
t  lu'se  difficult  ies. 

1  he  requirement  to  stop  at  the  first  stationary  point  allows  us  to  work  with  the  red  "red 
Hessian  approximation  for  the  initial  active  set  exclusively,  and  so  the  possible  lack  of 
posit i ve  definiteness  outside  the  corresponding  subspace  does  not  affect  any  of  the  steps 
taken  during  the  solution  process  for  the  QP  subproblem.  In  particular,  conditions  C4  and 
C5  will  not  be  used  in  what  follows. 

Define  r  to  be  such  that  if  p  -  p  +  or/,  then  u\.  =  or.  ,  where  clearly  iy  <  0.  Assume 
that  (I  is  computed  so  that  conditions  Cl,  C2  and  C6  arc1  satisfied,  and  in  particular  the 
following  condition  holds. 

</ (I  f  p1  lid  <  J.isrl'jp 

foi  some  ...  ().  Note  that  condition  Cl  implies  that  r,.  must  be  bounded,  ||iy||  < 
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Condition  G3  is  replaced  by  the  following  condition: 


C3\  The  step  a  is  taken  as  the  step  to  the  minimizer  of  <p(£),  where 

v(C)  =  ST{P  +  (d)  +  \  {[pz  +  C dz)TZTHZ(pz  +  (dz)  +  ||c  +  Cvc||2)  - 

To  be  more  precise,  if  <p'(0)  >  0  then  let  a  =  0.  Otherwise,  let  ac  be  the  step  to  the 
nearest  inactive  constraint  and  define 


Q  m 


^(Q) 

<p" 


a  =  min(ac,am,aw), 


where  aA,  is  a  specified  bound  on  the  largest  acceptable  step. 

Also,  from  the  conditions  on  p0  in  step  (i)  of  the  rules  to  compute  the  incomplete  search 
direction,  and  from  the  way  a  and  d  are  obtained,  we  can  show  again  that  ||p||  is  unif  inly 
bounded  for  any  p  obtained  during  the  solution  of  the  QP  subproblem. 

If  A'  denotes  a  uniform  bound  on  the  norm  of  the  initial  point  obtained  from  (2.2.6) 
and  assumption  A2,  )jpo)|  <  A’,  we  have 

y(Po)  <  PnmgK  +  \{PhU  +  dlmA)K2  =  A', 

and  for  any  p  up  to  p,  as  pY  =  Pv0,  it  holds  that  ^(p)  <  K,  and  hence 

\{pz  +  (7JnZ)-xZTg)rZTHz{Pz+(ZTHZrlZrg)  -  \grZ(ZTH Z)~x ZTg  <  K. 

From  this  result,  we  get  the  bound 

\\Pz  +  ( ZTHZ)-'ZTg\\ 2  < 

PszH 

implying 

for  the  step  along  d,  note  that 

,  flnmg  +  $  sz  II A  +  0  nm  A  ^ 

and  from  |jd||  <  ,iun,i  we  must  have  that  for  some  0nrnp , 


ll/'ll  <  Pnmp 
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The  argument  in  the  proof  of  Lemma  1.3.2  still  applies  to  this  algorithm,  except  for 
one  minor  change  induced  by  the  introduction  of  condition  C3’.  It  now  becomes  necessary 
to  prove  that  a  bound  similar  to  the  one  in  (4.3.1)  still  applies  to  this  algorithm,  at  least 
for  the  case  when  ||/)||  is  small  enough  (otherwise,  condition  C6  is  sufficient  to  imply  the 
result).  The  following  lemma  establishes  this  result,  and  so  it  indirectly  proves  the  validity 
of  properties  Pi  and  P2  for  the  algorithm. 

Lemma  5.2.1.  If  \\p\\  <  where 

cl  (  ,,,  fidsefispm  \ 

8oivH  Pund  ~F  P"mvPnmA 

tin  n  a  is  buaiiihil  away  from  zero  in  condition  C3’. 


Proof.  From  the  definition  of  y'(0). 


y'(0)  =  gTd  +  pTzZTHZdz  +  c1  vc 


-T„ 


=  gTd  f  p1 lid  -  p'HYdy  -  p\YJ  11  Zdz  -  u*  Ap 


zJvT, 


f  r.  fl  +  (2  ‘hvlJ  Junj  -f  ,  lumi’fn  m  A 


')  • 


For  ||,1||  <b'. 


y'(0)  <  v’fl  +  \3dsct3sprn  < 


and  from  condition  C2, 


y  ( d  )  —  4  Pdsc^spm  • 

The  step  to  the  minimizer  of  yTC)  >s  given  by  o  =  -y>'(() )/y",  and  as 

=  dlzZrH Zdz  +  lie,!!2  <  max(4/t,^m,  WLd  =  /*" 


we  can  write  a  hound  for  this  step  as 


>  fll  = 


fidsclf 


spm 


M3" 


Again,  selecting  Tj  =  min(/3", )  and  using  the  same  reasoning  as  in  the  discussion  before 
Lemma  1.3.2.  we  get  that  the  step  satisfies  et  >  \i3g.  I 

From  this  result,  properties  Pi  and  P2  follow  along  the  lines  presented  in  Lemma  4.3.2. 
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Descent  properties 

The  next  result  that  we  need  to  establish  is  that  the  descent  condition  given  in  property 
P3  holds  for  this  algorithm. 

Lemma  5.2.2.  There  exist  constants  0  <  (3\  <  02  >  0,  and  initial  points  for  the  QP 

subproblem  that  give  values  for  the  search  direction  Pk  satisfying 

plfik  +  rz(p^kZjllkZkpZk  +  ||4  +  rocjl2)  <  -MpTZkZlHkZkPzk  +  ||4  +  u>cJ2)  +  A-lktll. 


Proof,  Since  no  constraints  are  deleted  from  the  active  set  until  a  stationary  point  is 
reached,  we  must  have  py  —  pYo.  Consider  the  following  cases: 

(i)  p  is  obtained  as  the  solution  of  the  QP  subproblem.  Then  for  some  fi  >  0, 

PTg  +  PrHp  =  pTArp  =  -cTp  <  ||/I||||c-||  <  IIpHIM] 
and  as  iv-  =  0  at  the  solution,  ||c||  <  /3n»n/t ll/toll  and  pY  =  pv-0, 

p1  II  p  =  p\ZJHZpz  +  (p+  Zpz)THYpy  o  <  PyZ1  II  Zpz  +  20ivnJnmp$pcs\\r\\. 
and  we  finally  get 

pTg  +  \{p\ZTHZPz  +  ||c||2)  <  ~\{PTzZTIlZPz  +  ||e||2)  +  A’||r||, 

whore 

^  =  $nmu  “h  H  finmpfi  pc 3  T  dnmA^pcs' 

(ii)  p  is  obtained  by  taking  a  descent  step  on  f  from  a  stationary  point  p.  There  are  a 
number  of  possibilities: 

•  If  ||p|!  >  4  and  ||p  —  p0||  <  we  need  to  consider  different  values  for  ||c||.  If 
|k'||  <  ( i  =  b]  /(2/fpr),  then 


but  this  is  a  contradiction,  so  we  mir‘.  have  |)cjj  >  fj,  in  which  case 

lbll</?nmp<^lk||  =  A-||C||, 

<1 
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implying  that  for  /j.*  =  3nmg  +  ((5lvlt  +  3lmA)3nmv , 

PT9  +  (pTzZTHZpz  +  M2)  <  312\\p\\  <  $A'||c||. 

i  lIUll!\ .  Using  ||c||  £  ^nm/t||Po||  ^  3nmA  3pcs  j  I  ^  |  j  , 

PT9  +  \{p‘zZrtiZpz  +  ||c||2)  <  -\(pTzZ7 II Zpz  +  ||c||2)  +  A'||r|| 

W  he]  e  A  —  23‘2  3nrnp$nmA$pc3 1  &  ■ 

•  Let  yi-  denote  the  function  used  to  bound  the  desired  descent.  If  \\p—  poll  >  l f1 1  then, 
after  the  At h  QP  iteration, 

'rk  =  gTPk  +  L2{prZkZTHZpZk  +  ||c||2). 

Making  use  of  the  fact  t hat  pYk  =  pVo  for  all  A  up  to  the  stationary  point,  we  can 
write 

Pk- 1  -  <Ffc  =  t^-i  -  0fc  +  Py0YtH Z(pZk  -  pZk_1 ), 

where  17.  is  the  QP  objective  function  after  iteration  A.  For  all  iterations  between  the 
initial  point  and  the  stationary  point,  it  holds  that. 

‘r’o  -  2p  -  V’o  -  </’  +  pl0YTII  Z(pz  -  p zq  ) . 

We  can  use  (1.3.4)  to  write 

\plYtYl  II  Z(pz  -  Pz0  )|  <  2/3/[,w/I„mp||poj|  <  2/3/t.;//I„n»p/?PLi||r||  =  A'/||r|J. 

If  we  let  7  =  i/’o  —  if’,  it  follows  that 


<FS«F<l*3o-7  +  A''|lr  II  • 

From  one  of  the  intermediate  results  in  the  proof  of  Lemma  4.3.3,  wo  have  7  > 
2 ihziPZ' ' 1 1m  )2.  Consequently, 

/ff  +  WzFllZpz  +  ||r  +  Well2)  <  -faiplFUZpz  +  ||f  +  H-v||2)  +  A’ || r || , 


ftbjfffinmp 


where  I\  =  K'  +  /A*  and 
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•  ^  I!/'11  <-  A1,  we  know  from  Lemma  5.2.1  that  we  have  descent  for  <p,  and  the  minimal 
descent  rate  is  bounded  by 


9 


£W 

W  ' 


where  -  v',(0)/y‘"  is  the  step  to  the  minimizer.  As  the  step  is  at  least  ,  by  assuming 
the  .unite  (minimum)  rate  of  descent  as  before,  we  get  for  the  descent  from  p. 


2^(0)  \i$l  >  iPdscfiipm  0 3 • 

Bv  select  ing 

o  <  ,j,  < 

,H@wnp 

we  can  w  l  it 11 

P1!J  +  \{pT,ZTHZp2  +  ||c+  wcf)  <  ~ih(pT7ZTHZpz  +  ||c  +  tcc||2)+  A'||r|| 
for  A  ■-  This  completes  the  proof.  | 


Bounds  for  the  penalty  parameter 

We  now  determine  modified  bounds  for  the  penalty  parameter.  We  assume  that  the  mul¬ 
tiplier  estimates  are  obtained  according  to  conditions  C7-C9,  given  in  Chapter  2.  and  in 
addition  we  i moose  an  extra  condition  on  the  choice  of  the  initial  working  set  made  at  each 
iteration: 

C13.  The  initial  active  set  must  be  selected  so  that  there  exists  an  t"  >  0  such  that  if 
!!/>*!!  <  C',  tlnui  the  active  set  at  pk  is  the  initial  active  set. 

From  the  deli  nit  ion  of  the  search  direction,  pk,  this  condition  implies  that  eventually  pk 
must  be  the  Miiution  T  the  QP  subproblem,  and  it  must  be  determined  in  just  one  QP 
iteration  (no  <  oust raints  added  or  deleted). 

Define  the  auxiliary  vector 

wg  =  ZT(j  -  ZTffp.  (5.2.1) 

Property  P4  D  an  immediate  consequence  of  the  following  lemma: 

Lemma  5.2.3.  There  exists  a  value  pk  such  that 

O'fc(0,p)  <  -\(pT7lZ[lIkZkpZk  +  !|cfc  +  wCk  !|2)  (5.2.2) 


for  all  p  >  ji.  . 


Global  convergence  results 


85 


Proof.  From  the  expression  for  O'(0)  given  in  (3.6.2),  we  can  write,  using  (5.2.1), 


o'(0)  =  plzZrg  +  p!y\ '  ’g  +  pr(c  -  a)  -  2 £t(c  -  s',  -  p\\c  -  s||2 

=  -pTzZTHZp2  -  pTzZTHYpy  +  pTzwg  +  p*YTg  -  prAYpv 

-  pqAZpz  -  pTs  -  2 (,T{c  -  s)  -  p\\c  -  s||2 

=  -plZlU  Zpz  -  ||c+  || 2  +  bT(c  +  wc)  +  p\{ wg  -  ZqATp) 

-  -  2£r(c  -  -s)  -  p||c  -  .s||2, 


where  £  =  p  —  A  and  b  is  defined  from 


0  =  ||c+  uvli2  -  pTyYT(HZPz  +  Arp  -  <J) 


b 


'o  if  ||c+cce|!=0 

6 

- - —  ( c-  -h  icv )  otherwise. 

k  +  “vl* 


( 'onsequently.  67(d  -f  tiy)  =  0.  as  ||c  +  u?,.||  =  0  =>  pY  =  0. 

If  b  and  a-,-  are  redefined  to  be  full  m- vectors  by  giving  the  value  zero  to  all  components 
corresponding  to  constraints  not  in  the  initial  active  set,  we  may  rewrite  the  previous 
equation  as 


O'(0)  =  -;;7Z7//  Zpz  -  ||  c  +  tcc||2  +  hVc  +  pTz{  w3  -  Z1  A1  ,i)  +  (b  -  p  )Ts 
-i -  (b  -  2 £)'(c  -  s)  -  p||c  -  -s 1 1 2 . 


The  condition  to  be  satisfied  can  then  be  expressed  as 

b‘ tv,  +  p‘z(icg  -  Z1  Atp)  +  (b  -  p)Ts  -r  (b  -  2^);(c  -  .s)  -  p|jc  --  s\\2 
<  \{VzZlUZp7,  +  ||c+  HV||2), 

and  a  stronger  condition  on  p  is  given  by 

p(e  —  *)  ( r  ~~  s )  ^  ( b  ~  )7( c  —  s )  +  h1  wc  +  p7  ( 'vg  Z2 .  1 7/i )  i  ( b  -  P ) 7s. 

A  value  p  sin  h  that  (5.2.3)  holds  for  all  p  >  p  is 

||h||  +  2||s||  inaxtO  .  b'tr ,  T  pi{u\,  -  Z1 A'p)  T  (b  -  p)'x) 

''  !k  -  HI  +  ’  |k  -  HI2 


(5.2.3) 


2d 


completing  t  he  result .  | 
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We  now  prove  property  P4.  As  a  consequence  of  assumption  A3  and  the  definition  of 
u'c  there  exists  a  constant  3pcf  such  that 

HffcPvJ 


£  PpcJ- 


(5.2.5) 


114  +  wcJ| 

From  condition  CIO1  and  (5.2.5),  we  then  have 

pWHZv,  +  Ik'd-  uy||2  >  /W||Zp2||2  +Plcj\\Ypy\\2  >  min(/?2c/,  /JsjW)  ||p||2.  (5.2.6) 

Defining  3H  =  ^  min(;32  3,:h)  we  obtain  property  P4, 

d>'(0)  <  -l3H\\p\\2.  (5.2.7) 


Another  result  that  is  useful  in  the  lemmas  that  follow  is  the  boundedness  of  the  auxiliary 
variable  b.  From  (5.2.5),  assumptions  A1-A2  and  condition  CIO1,  we  have  that 

11*11  <  lk+  u-cll  +  HZPz  +  ATp  -  g\\  <  1 V'.  (6.2.8) 

lk  +  Wc  || 

Regarding  the  penalty  parameter,  the  same  approach  that  was  presented  in  the  previous 
chapter  still  applies  in  this  case;  that  is,  we  define  its  value  to  satisfy  property  P 4  and  to  be 
small  enough  so  that  p/ p  is  bounded.  An  example  of  a  selection  rule  having  these  properties 
is  given  in  the  next  paragraph. 

Let  tfk  =  p^kZjlfkZkPzk  +  ||4  +  wckll2-  As  in  (4.3.11),  we  define  the  bound  for  the 
penalty  parameter  by 

/  Pk-l  if  <£'(0, />*_,)  <  -ly5* 

Pk-  —  s  (5.2.9) 

I  max(/5fc ,  2pfc_i )  otherwise, 

where  p<j  =  0  and  pk  is  defined  by  (5.2.4). 

The  next  result  establishes  property  P5. 

Lemma  5.2.4.  Assuming  the  bound  given  in  (5.2.9)  for  the  multipliers,  for  any  iteration 
k(  in  which  the  value  of  p  is  modified, 

Pfc.IIPfc.il2  <  A 

and 

Pkf  Ikfc.  -  s*,||  <  N, 


for  some  constant  N . 
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Proof.  11  tin-  penalty  parameter  is  increased  only  at  a  finite  number  of  iterations,  the  result 
follows  from  assumption  A2,  Lemma  2.1.1  and  the  boundedness  of  ||p;||.  For  the  rest  of 
I  lie  proof  we  then  assume  that  there  exists  an  infinite  sequence  of  iterations  along  which 
the  penalty  parameter  is  increased  without  bound. 

From  Lemma  5.2.2. 


-  p'fl  +  (2A  -  //) '(c  -  s)  ~  i>\\c  -  s ||2 

<  -17  -r  4  ){plxZI U  Zpz  4-  ||c  +  i/y||‘)  +  (2A  -  /<  +  d2  -  s)  -  p\\c  -  .s||4 


and  if  <;/(())  >  —  \{pl/Zl  11  Zpy  +  ||r  +  u\.\\2)  then,  from  the  boundedness  of  the  multipliers 
and  J>-  and  from  (5.2.0). 


•"!!  > 


4 


|2A  -  /(  -f  .^0 r || 


(plz'HZpy  +  ||c  +  tc,  II*)  >  A  ,!!/>, 


(5.2.10) 


From  assumptions  A1  and  A2.  Lemma  2.1.1.  (5.2.S)  and  definition  (5.2.1). 


P\\c~s\\2  <  .V,, 

and  from  (5.2.10)  it  follows  that 

p||pj|-'  <  .V,.  (5.2.11) 

Under  the  assumption  that  pkl  —  oc .  this  result  implies  that  \\pkl\\  —  0. 

We  now  show  that  for  a  large  enough  value  of  the  penalty  parameter  f>kl  it  must  hold 
t  hat 

max(0.  bj:w  ,..  +  plkl(irgkl  -  Z'^A^p^)  +  (4,  -  =  ()- 

If  IKII  “  0.  we  can  show  that  ||4(||  —  0.  From  condition  C13  we  must  eventually  have 
u\kt  -  0.  and  so  ||4,  +  (/•,-;., ||  —  0.  Furthermore,  from  Lemma  3.1.1  and  condition  C8  on 
the  multipliers,  j| A[  pkl  -  <yt(||  —  0.  From  (5.2.8)  we  can  write  the  bound 

IIM  <  114,  +  «v-*, II  +  ‘1pcf(\\Uk,Zklpzkl\\  +  \\Alpkl  -  <7C,||), 

and  therefore  we  have  \\ltkl  ||  —  0. 

Since  j|/;t.  [J  —  0.  there  exists  an  index  l\  such  that  hk  <  pk(  for  all  F;  >  l\  .  (We 
use  strict  complementarity  at  the  solution.)  Also,  for  A'/  large  enough  it  must  hold  that 
\\l>kl\\  <  and  from  condition  C13  in  that  iteration  we  must  have  “w,  =  <>«/' £.-4, 4  =0 
and  u\kl  —  0.  Hence, 

+  /4*.,("'.vtt,  -  )  +  (4-,  ~  )?-%  -  (4,  -  Pkt  )f-sfc,  <  (F 


Proof  of  global  convergence 

The  proof  of  global  con  /ergenre  follows  along  the  same  lines  as  in  the  previous  chapter. 
Theorem  5.2.1.  The  ait/orithm  described  in  this  chapter  has  the  property  that 

lint  \\pk\\  =  0.  (5.2.13) 

k  —  x 

Proof.  Follows  from  the  same  arguments  used  in  the  proof  of  Theorem  4.3.1.  I 
Corollary  5.2.1. 

lint  II j*  -  7*|j  =  n. 

A:— »oo 

Proof.  The  result  follows  immediately  from  Theorem  5.2.1  and  Lemma  3.4.1.  | 

Corollary  5.2.2. 

lint  || A*  -  A*|j  =  0. 


Proof.  The  result,  follows  from  Lemma  3.7.1,  given  the  results  in  Lemma  3.6.6  and  Corol¬ 
lary  5.2.1.  | 
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5.3.  Rate  of  convergence 


In  this  chapter  we  assume  that  our  approximation  to  the  Hessian  is  only  accurate  on  the 
null  space  of  the  active  constraints.  A  consequence  of  the  use  of  less  precise  information  is 
a  degradation  in  the  rate  of  convergence  for  the  algorithm.  We  are  now  only  able  to  show 
that  under  condition  Cll’  the  algorithm  converges  two-step  supeilinearly  (as  opposed  to 
the  one-step  su  pet  linear  convergence  established  in  Chapter  1).  The  proof  follows  the  same 
genera!  pattern  presented  in  t  napiet  3. 

We  start  by  establishing  property  P7. 

Lemma  5.3.1.  For  iterations  A.-/  in  which  the  penally  parameter  is  increased,  assuming  an 
infinite  sequence  of  such  iterations  occurs  in  the  algorithm, 

'im  Pfc,iiP*«ll2  =  0 

/—CO 

a  net 

>'»>  pkMck,  -  -n-M  =  o. 

/— oo 

Proof.  For  large  enough  p,  from  definition  (5.2.1)  and  the  remarks  in  Lemma  5.2.4, 


p||c-.s||<2||6||+4||£||. 

From  Corollary  5.2.2,  ||Ct,||  —  0,  -  gkt\\  — *  0,  and  using  Theorem  5.2.1  and  Corol¬ 

lary  5.2.1,  from  (5.2.8)  and  condition  C13, 

0  <  || M  <  ||a-(  +  +  ||-^,^l|||| Hk,Zk,Pzkl  +  Pkt  ~  <71,11  -  0. 

giving 


lim  Pkt\\ckt  -  =  0. 

I  — *OG 


But  (5.2.10)  implies 

lim  Pk,\\Pk,\\2  =  0, 

/— *OC 

completing  the  proof.  | 

Our  goal  is  to  prove  a  result  similar  to  Theorem  4.1.1  for  the  algorithm  introduced  in 
this  chapter.  As  in  the  previous  chapter,  some  additional  conditions  need  to  be  imposed.  It 
was  mentioned  at  t  he  beginning  of  the  chapter  that  our  interest  is  to  study  the  consequences 
of  approximating  onlv  the  reduced  Hessian.  In  this  case,  condition  Cll  cannot  be  enforced, 
and  it  is  replaced  by 
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Cll’.  Following  Powell  [Po78],  we  assume 

\}Zj(Hk-Wk)ZkPzk\\  =  o(\\Pk\\). 

Note  t hat  t his  condition,  together  with  condition  CIO’,  implies  that  for  points  close  enough 
to  t he  solution  we  must  have 


PzkZkW  ZkPzk  >  \P!‘zH\\ZkPzk\\2  ■ 

As  a  consequence  of  the  use  of  less  restrictive  conditions  on  Ilk,  condition  C12  is  no 
longer  adequate,  and  it  also  needs  to  be  replaced.  The  new  condition  does  not  apply  to  the 
multiplier  estimates,  which  now  are  only  required  to  satisfy  C7-C9;  instead,  it  limits  the 
acceptable  values  for  the  penalty  parameter  pk- 

C12’.  When  the  iterates  are  close  to  the  solution,  the  penalty  parameter  is  chosen  to  be 
“large  enough". 

Tin'  following  results  will  make  clear  what  is  a  suitable  lower  bound  for  the  penalty  param¬ 
eter. 

If  these  conditions  hold,  using  the  previous  results  and  Lemmas  3.8.2  to  4.4.3,  we  can 
show  that  the  algorithm  converges  two-step  superlinearly. 

Theorem  5.3.1.  There  exists  a  value  p,  such  that  if  pk  is  selected  satisfying  pk  >  p,  then 
the  algorithm  converges  two-step  superlinearly. 

Proof.  We  start  by  proving  that  if  pk  is  large  enough,  condition  (2.2.3)  is  satisfied  for  all 
large  k.  In  the  rest  of  the  proof  we  drop  the  subscript  denoting  the  iteration  number. 

As  in  Hyrd  and  Nocedal  [BN 88],  we  let 

L{x,\,s)=  F(x)  -  XT(c{x)  -  s).  (5.3.1) 

We  can  now  use  a  Taylor  series  expansion  to  write 

A  L  =  L{x  +  p,  X,s)  -  L{x,  A,s)  =  gTp  -  \T Ap  +  \pTWp ,  (5.3.2) 


where  W  =  V l,{ x  +  9p,X,s)  and  0  <  8  <  1. 
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Rearranging  terms. 

\L  =  i>{Yr(g  -  At A)  +  p\ZTg  +  \p[ZT\VZPz  +  (\Ypy  +  Zpz)TWYpY 
=  PlyYT(9  -AT\)  +  opTzZTg  +  ( 1  -  o)p\Z\W  -  H  )Zpz 

-  (1  -  <r)pTzZTHYpy  +  (±Ypy  +  Zpz)TWYpy  -  (i  -  a)PTzZTWZPz. 

Assume  now  t hat  k  is  large  enough  so  that  ||VV/||  <  2|) W*  jj  =  /?*,  where  U7*  indicates  the 
Hessian  of  the  Lagrangian  function  at  the  solution,  and  also  that  the  bound  p^ZT\V  Zpz  > 
2^s;h\\^Pz\\2  holds.  We  may  rewrite  condition  Cll*  in  the  form 

plkZ[(Wk  -  Hk)ZkpZk  =  u)k\\ZkpZk\\\\pk\\, 

where  *jk  —  0.  Consequently 

YL  <  p[Y\g  -  ArX)  +  op\ZTg  -  ((A  -  <y)\f3szH  -  ( 1  -  <r)o/)||Zpz||2 

+  ^\\Ypy\\2  +  (( 1  -  *Mvh  +  *)  +  /?*) \\Zpz\\\\Y Py\\. 

For  k  large  enough,  there  exist  positive  constants  a i,  02  (e.g.,  take  a  j  =  2(1  -  ct)^vh  +  if 
and  a-2  =  A(A  -  0 )i3szn),  such  that 

YL  <  Py)'T(g  -  At A)  +  crpTzZTg+  \ff\\ Ypy ||2  +  ai||Zpz||||Fpv||  -  a2\\Zpz\\2. 

We  now  study  the  merit  funciion  (2.2.1)  at  a  =  1.  We  can  write  it  as 

<P(Y)  -  />(.)'  +  p,  A,s)  +  (L(x  +  p,p,s  +  q)  -  L(x  +  p,  A,s)j  +  §p||c(a:  +  p)  -  s  -  <7||2 
=  s)  +  (Ar(c(x  +  p)  -  5)  -  pr(c(x  +  p)  -  s  -  7))  +  ^p||c(i  +  p)  -  s  -  <?||2 

+  ~  +  °PTzZTg  +  \(f\\Ypy\\2  +  fii \\Zpz\\\\Ypy\\  -  a2\\Zpz\\2. 

Using  c,(x  +  p)  -  sx  -  q,  =  pTV2cl(zl)p ,  where  Z{  -  x  +  9,p  for  some  9,  £  [0, 1],  we  have 

d>(l)  =  0(0)  +  p[YT{g  -  AtA)  +  erpTzZTg  +  A Tq  -  E.&p7^2^,)?  ~  2P\\C  “  «l|2 
+  ^E,(/V'2c,(.',)p)2  YaxWZpMYpyW  -  a2\\ZVz\\l  +  ±/f  \\Ypy\)2 
<  0(0)  +  <70/(0)  -  apyYrg  -  cr( 2A  -  p)T(c  -  s)  +  XTq  +  pjYT(g  -  A TX ) 

-  (7  -  °)p\\c  ~  -5||2  +  a'iWZpAWlY py\\  -  a2\\Zpz\\2  +  0*||l'py||2, 

where  we  have  made  use  of  Lemma  3.8.2  and  the  facts  that  ^  >  0  and  the  second  derivatives 

of  the  constraint  functions  are  uniformly  bounded.  This  result  holds  for  large  enough  k. 
and  positive  constants  a\,  n2  (again,  take  for  example  a\  =  2a\,  a2  —  2). 
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Rewriting  t his  expression,  we  get 

0(1)  -  0(0)  <  o4>{  0)  +  (1  -  cr)pyYT(g  -  ATp )  -  (1  -  2  ct)£t(c  -  s)  -  (1  -  o)pTq 
_(I  _  a)p\\c  _  a||2  +  a/1|jZpz|jj|y py||  -  a' ||Zpz||2  +  (f\\YpY\\2. 

From  Lemma  1.4.3,  condition  C8  on  the  multipliers,  and  selecting  k  large  enough  so  that 
pTq  =  0,  it  follows  that 

lls  -  ATp\\  <  0||p|| 

for  some  constant  3.  Finally,  we  can  select  p  large  enough  so  that  for  large  k, 

-(1  -  2 cr)£T(c  _  5)  -  (i  -  a)p\\c  -  s||2  <  -  o)p\\c  -  s||2; 

for  example,  let  p  be  larger  than  twice  the  bound  given  in  (5.2.12).  We  then  have 

0(1)  -  O(0)  <  er<0'(0)  -  \{\  -  o)p\\c  -  s||2  +  a';\\Zpz\\\\Ypy\)  -  a'2||Zpz||2  +  a3||V'py||2, 
where  a”  =  a\  +  3  and  <z3  =  0*  +  /3. 

Assume  that  k  is  large  enough  so  that  p  is  obtained  as  the  solution  for  the  QP  subprob¬ 
lem,  the  correct  active  set  has  been  identified  and  pc,  <  A,  for  all  active  constraints  (this 
follows  from  Lemma  3.8.3).  From  (5.2.5), 

\\ypy\\<0Pcf\\c\\<^cj\\c-sl 

and 

0(1)  -  0(0)  <  crO'(O)  +  (a3  -  i(i  -  a)p)||c  -  s||2  +  a"'\\Zpz\\\\c  -  s||  -  a'2||Zpz||2, 

where  a'['  =  3Pcja"  and  a'3  =  /3pc/o3. 

From  the  arithmetic  mean/geometric  mean  inequality, 

///2 

a'"||Zpz||||c-s||<  i(a'||Zpz||2  +  ^-||c-s||2),  (5.3.3) 

we  finally  obtain 

m2 

0(1)  0(0)  <  a0'(0)  -  2 a2 II ^ Pz  1|2  +  («3  +  -  2(2  ~  °)p)  llc  -  5II2'  (5.3.4) 

4a,3a,2  -f  2a'/'2 
^  ~  (1  -  2cr)a2 


If  p  is  chosen  so  that 
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tin'll  t ho  step  of  n  =  1  will  satisfy  condition  (2.2.3). 

Finally,  applying  Theorem  1  from  Powell  [Po7S],  we  obtain  the  desired  convergence 
result.  | 

Most  of  the  proof  for  the  previous  theorem  is  devoted  to  showing  that  a  unit  steplength 
is  eventually  acceptable  if  the  penalty  parameter  is  sufficiently  large*.  Clearly,  the  proof 
given  here  still  holds  for  the  algorithm  presented  in  Chapter  1,  and  this  gives  a  second  set 
of  alternative  conditions  for  superlinear  convergence,  where  the  condition  on  the  multiplier 
estimate  C12  is  replaced  by  a  condition  on  the  penalty  parameter  C12\ 

5.4.  Summary 

In  this  chapter  we  have  studied  an  algorithm  similar  to  the  one  presented  in  Chapter  4,  but 
where  the  conditions  on  the  approximation  to  the  Hessian  have  been  relaxed,  so  that  now 
only  the  approximation  to  the  reduced  Hessian  is  required  to  be  positive  definite. 

The  results  obtained  have  been: 

•  Cnder  conditions  Cl  C9  on  the  search  direction  and  multiplier  estimate,  and  con¬ 
dition  CIO’  on  the  approximation  to  the  reduced  Hessian,  if  the  approximation  for 
the  rest  of  the  Hessian  is  assumed  to  be  such  that  // k  is  positive  definite,  then  the 
algorithm  is  globally  convf  rgi  nt. 

•  An  alternative  algorithm  has  also  been  shown  to  be  globally  convergent,  where  no 
assumption  is  made  about  the  Hessian  approximation  outside  the  null  space  of  the 
active  constraints,  but  requiring  the  additional  condition: 

C13.  the  initial  active  set  must  be  selected  so  that  there  exists  an  e"  >  0  such  that 
if  ||/;;.||  <  then  the  active  set  at  p *.  is  the  initial  active  set. 

•  Finally,  we  have  proved  that  the  algorithm  is  two-step  saperlinearly  convergent  if  in 
addition  the  following  conditions  are  satisfied: 

ci r.  \\/.[{nk-\vk)zkpZk\\  =  o(\\pk\\). 

C12’.  Wlien  the  iterates  are  close  to  the  solution,  the  penalty  parameter  is  chosen  to 
be  large  enough. 

■Note  that  when  no  conditions  are  required  on  the  approximation  to  the  Hessian  on 
subspares  other  than  the  null  space  of  the  active  constraints,  the  algorithm  leaves  open  the 
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possibility  ot  using  an  approximation  scheme  satisfying  condition  Cll  from  the  previous 
chapter  (instead  of  condition  Cll’).  This  would  allow  the  algorithm  to  attain  a  one-step 
superlinear  rate  of  convergence. 


Chapter  6 

Exact  Second  Derivatives 

I  liir-  chapter  considers  a  third  variant  of  the  framework  algorithm  presented  in  Chapter 
2.  Auaiu.  a  partial  solution  for  the  QP  subproblein  is  used  as  the  search  direction,  hut  in 
this  case  the  Hessian  approximation  IIk  is  taken  to  bo  the  exact  Hessian  of  the  I.agrangian 
function  at  the  last  iterate,  that  is 

Ilk  =  VjrL(xk.Xk)  -  V2F(xk)  -  ^.A^V^c.l Tk). 

where  now  llk.  and  even  the  reduced  Hessian  ZjHkZk.  cm i  he  indefinite. 

There  are  numerous  theoretical  and  practical  benefits  deriving  from  the  explicit  use  of 
second  derivatives.  For  example,  it  will  he  seen  in  this  chapter  how  to  define  an  algorithm 
generating  a  sequence  that  converges  to  a  second-order  KKT  point.  Alsu,  in  practice  it  has 
been  observed  that  second-derivative  methods  usually  converge  in  much  fewer  iterations 
t  han  t  hose  required  by  first-order  methods.  However,  the  use  of  second  derivatives  presents  a 
number  of  technical  difficulties,  all  of  which  stem  from  the  loss  of  control  over  the  properties 
of  Hi In  order  to  reap  all  the  benefits  from  the  availability  of  second  derivatives,  we  need 
to  redefine  the  way  the  search  direction  is  obtained.  In  all  other  respects  the  basic  principles 
introduced  in  Chapter  2  will  still  be  preserved. 

The  next  section  presents  the  definition  of  the  incomplete  solution  for  the  QP  subprob- 
huiis.  to  be  used  as  the  search,  direction  in  each  iteration.  The  rest  of  the  chapter  proves 
global  convergence  for  the  algorithm,  and  shows  that  under  mild  conditions  the  algorithm 
converges  quad  rat ic  ally. 
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6.1. 


6.1.  The  arch  direction 

Tie  definition  of  the  search  direction  given  in  Chapter  2  needs  to  he  modified  for  the 
algorithm  presented  in  this  chapter,  to  take  into  account  the  possible  lark  of  convexity 
in  the  subproblnns.  implying  the  possible  indefiniteness  of  // jt  and  rank-deficiency  in  the 
reduced  Hessians. 

In  the  case  when  the  Hessian  is  indefinite,  the  descent  directions  that  can  he  obtained 
from  the  QP  suhproblems  may  no  longer  provide  enough  descent  to  guarantee  the  conver¬ 
gence  of  the  algorithm;  that  is,  the  quantities  0^.(0)  may  no  longer  be  sufficiently  negative 
to  ensure  that  eg  —  Ok  + 1  satisfies  the  condition  used  in  the  proofs  of  Theorems  .1.3.1  and 
o.2.1.  In  this  section  we  present  a  procedure  to  generate  search  directions  that  either  give 
sufficient  descent,  or  are  directions  of  negative  curvature  (satisfying  ]>[ll  kl>k  <  0)  allowing 
a  sufficient  decrease  in  the  value  of  the  merit  function  to  ensure  convergence, 
file  smirch  direction  Pk  is  defined  by  the  following  steps: 

(i)  Obtain  a  feasible  initial  point  /»„  for  the  QP  subproblem  such  that  conditions  (2.2.0) 
and  (2.2.7)  are  satisfied. 

(ii)  Solve  the  QP  subproblem  until  a  stationary  point  p  is  found,  or  until  a  direction  of 
infinite  descent  d  is  obtained.  The  convergence  results  presented  in  this  chapter  do 
not  assume  the  use  of  any  specific  QP  algorithm,  but  the  following  conditions  must 
be  satisfic'd  by  the  method  selected. 

•  It  must  be  an  active-set  algorithm,  taking  feasible  descent  steps  in  each  iteration. 
11  steps  having  a  positive  directional  derivative  for  o  =  0  are  taken,  the  total 
descent  must  be  uniformly  bounded  away  from  zero. 

•  It  must  be  able  to  find  a  stationary  point  (or  a  direction  of  infinite  descent  )  in  a 
number  of  iterations  uniformly  bounded  by  a  function  of  the  size  of  the  problem. 

•  Kadi  QP  iteration  must  produce  a  minimum  descent,  unless  we  are  at  a  stationary 
point  for  the  QP  subproblem.  To  be  more  precise,  let  p  denote  any  intermediate 
point  along  the  solution  of  the  Qp  subproblem  and  let  d  be  the  QP  search 
direction  at  j>\  also  let  a  indicate  the  step  taken  from  p  along  d ,  obtained  as 
tin'  minimum  of  the  steps  to  the  enidimensional  minimizer.  the  nearest  inactive 
constraint  and  a  specified  upper  bound,  in  the  same  spirit,  as  in  the  definition  erf 
n  given  in  condition  C3.  Finally,  let  gH  denote  the  projection  of  7+  {Ip  onto 
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tin1  null  space  of  the  active  QP  constraints  at  p.  Wo  require*  that  < l  satisfies  the 
following  condition: 


lip)  ~  Up  +  ad) 

IMII 


—  fiqpd\\9 (/II’ 


where  JVf„j  is  some  positive  constant. 


(6.1.1) 


1  lie  reason  for  this  condition  is  that  it  prevents  the  algorithm  from  taking  steps 
that  give  arbitrarily  small  descent  unless  \\gnW  is  small,  that  is,  the  point  p  is 
close  to  being  a  QP  stationary  point. 

(iii)  Define  f>  from  f>  or  <1  as  follows. 


(a)  If  a  direction  of  infinite  descent  d  satisfying  (0.1.1)  is  obtained  at  a  point  p  along 
the  solution  of  the  QP  subproblem,  define 


p  =  p  +  ad. 

where  a  >  0  is  chosen  so  that  ||/>||  is  uniformly  bounded  above  and  below. 
(I>)  If  p  is  a  second-order  k K  1  point  for  the  QP  subproblem,  let 


P  =  P- 


[c)  Otherwise,  select  p  by  computing  a  direction  d  and  a  steplength  a  satisfying 
conditions  Cl  C6. 

I  iv  i  I  he  following  condition  is  introduced  to  identify  the  circumstances  under  which  near 
singularity  in  the  reduced  Hessian  may  be  a  problem: 

Cl  4.  ||c"  |(  <  <  i ,  and 

t ipu)  -  vip)  , 
ll/'o  -  p\\  ~  ’  d' 

If  Cl4  holds,  obtain  an  estimate  for  the  active  set  at  the  current  point,  and 
compute  a  direction  p  by  taking  a  step  ad  from  p0  satisfying  Cl  C6.  If  no  feasible 
step  satisfying  these  conditions  exists,  let  p  =  p(). 

(v)  Selec  t  the  search  direction  p  as 

{P  if  lip)  <  lip),  C14  does  not  hold,  or  p  =  po 
p  otherwise. 
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Several  remarks  are  in  order  regarding  the  definition  of  p.  Condition  (6.1.1)  could  be 
replaced  by  die  alternative  condition 


(.9  +  ^  J’u»i\\9 h\\* 


which  may  provide  a  better  expression  for  the  stated  goal  of  linking  the  lack  of  descent 
associated  with  the  direction  d  and  t lie  proximity  to  a  QP  stationary  point:  but  this  is 
achieved  a t  the  expense  of  limiting  the  choice  in  the  selection  of  directions  of  negative 
curvat  tire. 

In  point  liv)  it  is  required  that  the  correct  active  set  at  a  nearby  stationary  point  should 
tie  identified.  I'tider  condition  (6.1.1),  an  estimate  for  this  active  set  having  the  desired 
properties  is  given  by  the  QP  active  set  at  the  initial  point  for  the  first  finite  QP  step  (the 
first  step  that  is  bounded  away  from  zero). 

Finally,  condition  C5  requires  the  computation  of  a  direction  of  negative  curvature.  In 
the  case  when  n  is  small  this  is  straightforward.  For  the  large-scale  case,  efficient  methods 
are  known  when  the  reduced  Hessian  is  not  too  large.  Although  some  work  has  been 
carried  out  for  problems  of  arbitrary  size,  see  for  example  Conn  and  Could  [CC81],  such 
methods  are  not  very  efficient..  Our  hope  is  that  satisfactory  methods  for  computing  feasible 
directions  of  negative  curvature  for  arbitrarily  large  problems  will  be  developed  in  the'  near 
future.  If  a  direction  of  negative  curvature  is  not  determined,  the  proofs  would  still  hold 
if  we  characterize  solution  points  to  be  first-order  KKT  points  for  the  problem  (instead  of 
second-order  KKT  points). 

Properties  of  the  search  direction 

As  in  the  previous  chapters,  the  first  result  required  for  the  convergence  proof  is  to  show 
that  if  ||p||  is  small  enough,  the  correct  active  set  must  have  been  identified.  We  start  by 
introducing  ;!■  'oUowin^  constant,  implied  by  the  non-singularity  assumption  A6 

A, ,  //  is  a  p  •  e  lower  bound  for  the  smallest  eigenvalue  of  the  reduced  Hessian  of  the 
I.agrangi,  •  >■  ction  at  all  second-order  KKT  points  for  the  NLP  problem  in  SL 

The  foil,  wing  lemma  establishes  property  PI  for  this  algorithm. 

Lemma  6.1.1.  Tin  it  crisis  an  <  >  0  such  that  j|/;(j  <  e  implies  that  p  was  obtaimd  as  a 
serond-orde  r  l\l\  I  point  of  the  QP  subpreddem  and  the  correct  active  set  has  been  identified. 
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Proof.  11  ii ■  mui'cl  identification  of  the  active  set  follows  front  strict  complementarity  at 
the  solution  point  (see  proof  for  Lemma  -'1.3.2). 

Assume  that  the  lemma  does  not  hold,  in  the  sense  that  there  exists  a  sequence  {.r^.} 
such  that  ,i\  — -  x*  and  ||/t;||  —  0,  where  pk  denotes  the  search  direction  obtained  for  the 
QP  subproblem  at  x k  in  the  form  described  in  the  previous  section,  but  pk  has  not  been 
obtained,  its  a  second order  KKT  point  for  the  QP  subproblem. 

It  l>k  —  Pk  and  \\pk\\  >  (i  for  an  infinite  subsequence  and  some  (a  >  0,  then  as  pk  must 
be  feasible,  we  must  have  ||c^||  —  0.  Also,  as  i'k(Pk)  —  0,  we  must  have  k'k(Pk)  ~~  0. 
From  this  and  condition  (6.1.1)  it  must  follow  that  x*  is  a  stationary  point  for  the  NLP 
problem.  given  that  it  is  feasible  and  in  the  QP  subproblem  we  have  no  descent  when  taking 
a  nonzero  step  from  the  origin  to  a  stationary  point. 

If  x*  is  a  second-order  KKT  point,  eventually  ]>k  —  Pk0  and  pk  —  ]>k-  If  x*  is  a  stationary 
point  but  not  a  second-order  KKT  point,  for  \\xk~  J‘*||  small  enough  we  can  find  a  direction 
dk  and  a  steplength  a;,  such  that  pko  +  nk<fk  is  feasible,  as  ||/^.(J||  —  0  and  the  information 
used  is  asymptotically  correct.  From  the  bound  given  in  (1.3.1)  and  condition  Cl. 

’<k  >  y-F,'.  ||dfc||  >  d(, j,j. 

implying  t  hat 

11/411  =  \\Pk0  +  r>kdk\\  >  \ 3°, hn,i- 

However,  this  contradicts  our  hypothesis. 

Assume  now  that  \\pk\\  —  0.  From  condition  C6.  this  implies  ||/4||  —  0.  and  from 
Lemma  3.3.1  we  must  have  that  x*  is  a  stationary  point.  Suppose  x*  is  a  second-order 
KKT  point.  TIhui  strict  complementarity  at  x*  and  the  fact  that  jjpkH  —  0  imply  that 
the  correct  active  set  is  eventually  identified.  Hence,  from  the  positive  definiteness  of  the 
reduced  Hessian  at  x* .  we  must  have  that  for  large  enough  k ,  pk  is  a  second-order  KKT 
point  for  the  QP  subproblem. 

If  x*  is  a  stationary  point,  but  not  a  second-order  KKT  point,  using  the  bounds  given 
in  Section  1.3  and  assuming  \\xk  -  r+||  to  be  small  enough,  we  can  find  a  direction  <4-  and 
a  steplengt  h  n k  such  t  hat 

<n-  y  y-F,'-  Ik4-||  >  >nn,i, 

implying  t  hat 

II/4II  ||/4  +  ojt4||  > 
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Again,  this  is  a  contradiction.  | 

As  in  previous  chapters,  the  proof  proceeds  by  showing  that  property  P3  holds  for 
this  algorithm,  that  is.  the  search  direction  computed  according  to  the  rules  introduced  in 
Section  (i.l  satisfies  a  descent  condition. 

In  order  to  prove  P3,  we  need  a  preliminary  result.  In  Chapters  4  and  5  it  was  possible 
to  show  that 

<i'k(Pk0 )  -  Vk(Pk)  -*  0  =>  j|pfco  -  Pk\\  —  0, 

using  the  positive  definiteness  of  // ^ .  or  of  Z^H^Zk  at  least.  This  argument  is  not  valid  in 
this  case,  and  we  give  an  alternative  proof  for  the  result  in  the  next  lemmas. 

In  the  following  lemmas  the  notation  {2/m}m=i  >s  used  to  represent  a  subsequence  from 
the  sequence  of  iterates.  {ym}  C  {t*}-  The  symbol  cm  denotes  the  vector  c(ym).  Hm 
corresponds  to  the  Hessian  of  the  Lagrangian  function  at  ym ,  and  pm  indicates  the  search 
direction  obtained  at  ym. 

Lemma  6.1.2.  If  the  convergent  sequence  {ym},  Vm  ■  V* •  satisfies  ||c~  ||  —  0.  it  must  hold 
that 

li’m(Pm)  •  0  =>  | j Pm ||  *  0, 

where  pTO  de  notes  the  search  direction  obtained  from  the  process  described  above.  Also,  y* 
must  be  a  stationary  point  of  the  NLP  problem. 

Proof.  Assume  that  the  lemma  does  not  hold,  i.e.,  that  V’m(Pm)  — ' “  0  but  ||pm||  >  b  >  0 
for  all  m. 

Since  the  norm  of  the  initial  QP  point  goes  to  zero  (||pmo II  — -  0),  condition  C14  must 
hold  for  large  enough  in. 

To  show  that  if  is  a  stationary  point,  take  a  subsequence  along  which  the  number  of 
QP  steps  is  fixed  (it  is  bounded),  and  all  intermediate  steps  converge  to  limit  points;  in  the 
limit  all  steps  give  zero  descent,  as  g’m(/3m)  —  0,  implying  that  all  intermediate  points,  and 
in  particular  the  origin,  must  be  stationary  points  from  condition  (6.1.1). 

Assume  that  y*  is  a  second-order  KKT  point,  and  that  a  set  of  limit  points  for  in¬ 
termediate  steps  has  been  obtained  as  indicated  in  the  previous  paragraph.  For  the  first 
nonzero  step  from  the  origin  d* ,  it  must  hold  that  ||d>  ||  >  0.  as  otherwise  we  would  have 
(f.J Z* 1 II Z*  (I*  -  0,  contradicting  assumption  A6.  But  then  g*Td*  >  0,  violating  the  first 
condition  imposed  on  the  QP  solution  method. 
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it  follows  that  at  y *  there  exists  either  a  direction  of  negative  curvature  or  a  negative 
multiplier.  Since  //,„  —  p*  (the  Jacobian  of  the  active  constraints  at  y*  has  full  rank),  then 
from  the  bounds  introduced  in  (1.3.1)  and  Lemma  3.3.2,  it  follows  for  m  large  enough  that 

t  m(/'m)  <  fhihpn 

when  there  exists  a  direction  of  negative  curvature,  or 

t  'rn(Pm)  S  —  J  Ctm  )  Vm  (  0))  <  —  7(j  tfgfidsc  'Tpin 

when  there  exists  a  negative  multiplier. 

Consequently,  in  either  case  is  bounded  away  from  zero,  which  contradicts  our 

assumption.  I 

Lemma  6.1.3.  Thin  i  xists  a  constant  <c  >  0  such  that  for  any  siquence  {t/m}  satisfying 
||c~  ||  <  in  must  hare 

t  'rn  (  Pttiq  )  —  Vm(Pm)  ~ *  0  =>■  ||Pnio  ~  /tm||  * 

Proof.  Assume  that  the  result  does  not  hold.  Consider  any  sequence  {c;},  such  that  tj  —  0 
and  (j  <  (].  For  each  (j ,  we  can  construct  a  sequence  {y^}  C  {ym}  such  that  ||c*~||  <  c;  for 
all  /.  j/j  —  ij*  as  /  —  oc  for  all  j.  v\ ( p' 0 )  -  v'(p')  —  0  but  ||/>j0  -  p'||  >  for  some  6j  >  0 
for  all  /.  Finally,  we  can  assume  that  y*  — ■  y* . 

From  the  previous  properties,  condition  C14  must  hold  eventually  for  any  of  the  se¬ 
quences.  Select  one  element  from  each  sequence  yl}  =  y;,  such  that  for  that  point  C14  is 
satisfied  and  y}  —  ij* .  Then  from  the  previous  lemma  we  must  have  that  pj  0  and  y*  is 
a  stationary  point  of  the  problem. 

f  sing  the  same  arguments  as  iri  Lemma  6.1.2,  if  y*  is  not  a  second-order  KKT  point, 
then  at  y*  we  will  have  either  a  direction  of  negative  curvature  or  a  negative  multiplier, 
and  since  p }  —  ft*  (the  Jacobian  at  y *  has  full  rank  from  assumption  A3),  and  a  similar 
property  holds  for  the  reduced  Hessian,  we  must  have  that 

• ' ;  (  ff  /■;  )  —  t  j(Pj)  A  tllin  (2/Jj5r/JSpm  ,  tlgdijfspn  ). 

contradicting  our  assumption. 

If  y*  is  a  second-order  KKT  point,  then  consider  the  sequence  {</*}.  For  this  sequence 
and  for  j  large  enough,  p*  (the  initial  point  for  the  QP  subproblem)  must  be  a  second- 
order  KKT  point.  This  follows  from  condition  (6.1.1),  implying  that  all  p*g  must  be  QP 
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stationary  points,  and  from  |)p*J|  — •  0,  t he  identification  of  the  correct  active  set  from 
-t rirt  complementarity  at  y*  and  assumption  A6.  But  from  arguments  used  in  the  previous 
lemma,  the  fact  that  we  have  no  descent  from  p*g  implies  that  t he  reduced  Hessian  must 
be  singular  at  p*  for  large  enough  j,  and  the  reduced  Hessian  must  also  be  singular  at  y*, 
contradicting  assumption  A6.  | 

We  can  now  prove  property  P3  for  the  algorithm. 


Lemma  6.1.4.  There  exist  constants  >  0  and  Ji  >  0  such  that 


slf>k  +  \l>llhPk  <  -J,Ma  + 


(6.1.2) 


Proof.  Define  <  „  satisfying  c  >  <H  >0,  where  e  is  the  value  from  Lemma  6.1.1.  and  such 
that  j|/;|j  <  t  h  im[)lies  that  p  is  a  second-order  KKT  point,  the  correcl  active  set  lias  been 
identified,  and  the  smallest  eigenvalue  for  the  reduced  Hessian  is  greater  than  \$s,n. 

Also,  from  Lemma  6.1.3.  let  A  >  0  be  the  value  such  that,  if  ||c_||  <  c,-. 


Ilpu  -  p\\  >  =>  v(po)  -  0{p)  >  A. 


Define 


J rung  "1“  $nm  H finmp ) 

having  the  property  that  ||po|j  <  e'  implies  |v(po)j  <  ^A.  Select 


e  i  =  mm 


pcs  ^tJpcs 


From  condition  (2.2.6)  and  assumption  A2,  there  exists  a  constant  3nmjt  such  that 

li Poll  <  Jpc )|A||  <  Primp- 


One  of  the  following  conditions  must  hold: 


•|.  From  the  boundedness  of  ||/>oj|  we  ran  write 


lip)  —  [i  P~ h  }P  hlj)  ^  t(  pa  )  A  ^nmpii^nmg  "b  ^  ^nvill  Jump  ) 

<  -ih\W  +  —[■hihmp  +  2 0nm9  +  /J„m///J„mp)||r||. 

f  1 
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•  ii'li  <  'i  and  \\l>\\  1  <//•  This  implies  j|/>0||  <  ;ipcs c,  <  <'  and  |C’(/>0)|  <  \b.  Also, 
ll/'u  -  l>\\  >  <h  ~  dKJi  i  >  7< and 

*-'(/-»«)  -  <•■(/»)  >  ^  =>  c(p)  <  =>  tip)  <  — H/jII  - 

•  ll'll  <  <i  and  |jp||  <  /u.  In  this  case,  as  p  is  a  second-order  KKT  {)oint,  for  the  QP 
subproblem, 

fJ  P  A  }>  II  p  ~c  p  <  At;  m  ,i  1 1  c  ||  ^  Anm„||r||. 

Using  the  notation  Ap  =  p  —  /j0. 

Pl  Up  =  Poll  Pa  +  2AplIlp0  +  Ap1  If  Ap 

>  - ^.rn// J^||r||2  -  2 A„„, j | r 1 1 1| Ap|  |  +  f.T,„|| Ap\\\ 

and  from  the  arithmetic  mean/geometric  mean  inequality, 

2|HI|A,.|i  <  ||rf  +  ||A,,||C 

,-'svH  v-'spc/’um// 

we  obtain 

P’UP  >  i.i„7/||A7>||2  -  Jn,n//-4(l  +  ^^)||r||2. 

The  ine(|iialities 

iM2  <  ill-VII2  +  5IIP0II2  +  HAplllM  <  ||A/,||2  +  llpoll2 

imply  that  we  can  write 

PT HP  >  l'JsrH\\p\\2  ~  A'lkH2. 

where 

*  -  -£«(<W(l  +  +  Hr»). 

v  v  i>svn  '  ' 

Putting  all  these  results  together,  we  have 

•dp)  <  A„m„||r||  -  \pTUp  <  --j7;ASv//||p||2  +  (|A'  +  Anm„  )||r||, 


completing  t  lie  proof.  | 
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6.2.  Definition  of  the  linesearch 

As  a  consequence  of  the  way  we  have  defined  the  matrices  ID  and  the  incomplete  solutions 
for  the  QP  subproblems  in  this  chapter,  the  search  direction  p*.  may  no  longer  be  a  descent 
direction,  but  rather  a  direction  of  negative  curvature.  The  linesearch  model  presetted  in 
the  previous  chapters  is  not  adequate  for  this  case.  We  can  no  longer  be  assured  that  the 
directional  derivative  at  the  beginning  of  the  linesearch  is  bounded  by  a  multiple  of  ||p<,.||2. 
The  structure  of  the  global  convergence  proof  would  then  fail  to  hold.  We  need  to  modify 
the  linesearch  model  introduced  in  Chapter  2,  and  we  will  do  so  according  to  the  ideas 
introduced  in  McCormick  [McC77],  and  further  developed  in  More  and  Sorensen  [MS84J. 

The  problem  considered  in  [MS84]  is  that  of  minimizing  an  unconstrained  function  when 
in  each  iteration  a  direction  of  descent  v,  or  a  direction  of  negative  curvature  w,  or  both, 
are  available.  The  search  is  carried  out  along  the  curve  C  =  {x(o)  :  x(o)  =  x  +  otc  +  o2u}, 
and  the  termination  conditions  when  the  direction  of  negative  curvature  is  available  are 
specified  in  terms  of  the  curvature  at  the  initial  point.  In  our  case  we  generate  only  one 
search  direction  j>k  for  the  original  variables  x  in  each  iteration,  but  the  search  on  the  merit 
function  is  made  not  only  in  the  space  of  the  original  variables,  but  also  in  the  space  of  the 
Lagrange  multipliers  and  the  slack  variables.  Whenever  we  make  use  of  p/.  as  a  direction  of 
negative  curvature,  we  need  to  define  not  just  one  search  direction  but  both  a  direction  of 
descent  and  a  direction  of  negative  curvature  in  this  expanded  space.  If  pk  can  be  treated  as 
a  direction  of  descent,  we  prefer  to  avoid  the  complications  associated  with  the  curvilinear 
search  by  reverting  to  the  linesearch  model  introduced  in  Chapter  2. 

The  next  paragraphs  present  the  definitions  of  the  expanded  directions  for  the  curvilinear 
search.  To  motivate  them,  we  start  by  studying  the  form  of  the  derivatives  for  the  merit 
function  along  the  curve  C.  We  define  the  unidimensional  merit  function  along  the  curve 
of  search,  o'  .  starting  from  the  point  y  and  moving  along  the  vectors 


(  x\ 

(  v  \ 

f  W  \ 

y  = 

A 

,  V  — 

t\ 

,  w  = 

h 

l  *  ) 

\  “*  ) 

\  U2  / 

as 

d>c(o)  =  IAy  +  «2t’  +  ft»)  =  F(xn)  -  rp\  («)  +  p<f> 2(0), 

where 

d>f(o)  =  \l(cUa)  ~  -v). 
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=  2  !!*•(  )  ~  s42- 

To  simplify  the  expressions  that  appear  in  the  analysis  of  the  different  functions  related 
to  the  merit  function,  we  introduce  the  notation 

xa  =  x  +  a2  v  +  a  w, 

Aa  —  A  T  Q2fl  +  at2, 
sa  =  s  +  a2ui  4-  ati2- 

In  the  case  when  a  normal  linesearch  is  performed,  the  value  of  the  merit  function  along 
the  line  of  search  will  be  denoted  by  <f>N .  This  linesearch  can  be  viewed  as  a  particular  case 
of  the  curvilinear  search,  when  w  =  0,  and  in  fact  for  the  definitions  of  the  vectors  t,  and 
u,  given  in  this  section  the  form  of  the  search  directions  is  identical  if  we  let  w  =  0,  but  it 
must  be  noted  that  the  termination  conditions  are  different  in  the  two  cases. 

Our  interest  in  what  follows  is  to  assign  values  to  u,  and  t,  in  terms  of  the  known 
quantities  at  the  current  point;  the  definitions  for  v  and  w  will  be  specified  later  as  a 
function  of  the  properties  of  the  search  direction  pk-  In  order  to  identify  satisfactory  values 
for  these  vectors  in  the  curvilinear  search,  we  need  to  study  the  form  of  the  first  and  second 
derivatives  of  the  merit  function  at  zero,  as  these  are  the  values  that  will  be  used  in  the 
termination  criteria.  We  start  by  forming  the  corresponding  derivatives  at  any  point.  The 
first  derivative  is  given  by 

</>c'(Q)  =  VF( xa)T(2av  +  w)  -  4>\'(a)  +  p<P 2  (<*)i 

where 

<t>L\(a)  =  (2at\  +  <2)r(c(x0)  -  «a)  +  A^Vc(arQ)(2an  +  w)  -  2au!  -  u2) 

and 

d>2  (<*)  =  (c(xa)  -  ^Vc(xa)(2au  +  in)  -  2aui  -  u2) . 

For  the  second  derivative  we  have 

0c"(n)  =  (2m;  +  uOrV2T(Za)(2au  4.  u,)  4  2VF(xq)Ti>  -  <t>f"(a)  +  petff  '(«)> 

where 

d>i  (o)  =  2(2afi  +  <2)r(Vc(x0)(2QU  +  w)  -  2 auj  -  u2)  +  2t^c(x0)  -  ,s0) 

+  A^2Vc(x0)i7  -  2«j)  +  £,A0i(2m;  +  w)TV2c,(xa)(2av  +  ir) 
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and 

T 

4> 2  («)  =  ||Vc(xc>)(2an  +  te)  -  2 au,  -  u2||2  +  (c(2a)  -  sQ)  ^2Vc(za)v  -  2ui) 

+  Yli{ci(xa)  ~  5a,)(2au  +  u>)TV2c,(z0)( 2av  +  w). 

As  we  mentioned  earlier,  we  are  interested  in  studying  the  values  of  these  derivatives 
when  o  =  0.  given  that  the  termination  criteria  for  the  linesearch  make  use  of  these  values; 
their  form  will  determine  the  definition  of  u,,  For  the  first  derivative  we  have 

0r  (0)  =  gTw  -  tj(c  -  s)  -  \t(Au:  -  u2)  +  p(c  -  s)T(Aw  -  u2), 

and  letting 

u2  s  Aw,  t2  =  0,  (6.2.2) 

we  obtain 

<t>c'(0)  =  gTw.  (6.2.3) 


For  the  second  derivative, 

Oc  (0)  =  wTS?2Fw  +  2 gTv  -  2 tf(c  -  s)  -  2 t2(Aw  -  u2)  -  2A T(Av  -  u}) 
+  E,(p(c.  -  5,)  -  A ,)u/TV2c,w/  +  p\\Aw  -  «2 II2  +  2 p(c  -  s)t(Av  -  Ml), 

and  after  replacing  the  expressions  for  u2  and  t2,  we  obtain 

0C  (0)  =  wtV2Fw  +  2gTv  -  2 tj(c  -  s)  +  2^p(c  -  s)  -  Xj  (Av  -  uj) 

+  E,(Kci  -  s,)  -  X,^wTV2cxw. 


Define 


ui  =  Av  +  c  -  s  u>,  tj  =  fi  -  A, 
for  some  vector  u>  to  be  defined  later  on,  implying 


(6.2.4) 


4>c  (0)  =  wtS72Lw  +  2gTv  +  2(2A  -  p)T{c  -  s)  -  2p\\c  -  s||2 

+  2wt(a  -  p(c  -  s))  +  J2,p(ci  -  s‘ )wTV2aw.  (6.2.5) 

To  make  sure  that  the  last  terms  in  (6.2.5)  take  acceptable  values,  we  select  u>  to  satisfy 

0  if  (c,  -  Si)wTV2CiW  <  0,  |tnrV2Citi;|  <  |c,  -  -«*,(, 

U>t=  <  or  E.(c,  -  s<)wTV2c,u;  <  || c  -  s||2; 

P  ( Cj  -  Sj)wTV2CjW 

.  2  A,  —  p(c j  —  $i) 


otherwise. 
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It  A,  -  p{c,  —  .s, )  is  very  small  or  zero,  and  the  first  set  of  conditions  does  not  apply,  t his 
definition  is  unsatisfactory  because  is  either  undefined  or  unacceptably  large.  To  avoid 
this  problem,  we  modify  the  current  value  of  />.  attempting  to  attain  two  goals:  we  want 
the  new  value  for  />,  say  p.  to  be  bounded  by  a  finite  multiple  of  its  existing  value,  and  we 
want  _■  to  ho  bounded  by  a  multiple  of  ||tr||2.  We  start  by  imposing  the  following  condition: 


P  c,  -  Si 
2  A,  -  p(c,  -  st) 


<  K 


(6.2.6) 


for  some  A  >  1.  Note  that  this  bound  implies  that  our  second  goal.  ||u;||  =  0(||tr||2),  is 
attained. 

We  now  show  that  our  first  goal  can  also  be  achieved.  If  the  previous  condition  is  not 
satisfied  for  tin-  i.irreni  value  of  p.  then  we  must  have 


A, 


p(c,  -  s,; 


< 


2  A” 


and  for  that  to  hold  it  must  also  be  true  that  A, ( c,  —  «,•)>  0,  so  we  can  write 

A,  2K  A,  2K 

<  P  < 


c,  -  ,s,  2 A'  +  1 

but  if  p  is  in  this  interval,  then 

2  A'  -r  1 


P  > 


c,  —  S{  2  A’  —  i 

A,  21\ 


2K  -  1  c,  —  2 A  —  1 

and  in  general  there  exists  a  value 

2  A'  +  1 


P  €  />,  - 


2  A'  -  1 


for  which  the  desired  bound  on  u/  holds. 
With  this  definition. 


-2p\\c  -  s||J  +  2u/7(a  -  p(c  -  ,s)j  +  52,p(c,  -  s,)wTY2rtw  <  -p||c  -  s||2. 


(6.2.7) 


(6.2.8) 


(6.2.9) 


(6.2.10) 


Negative  curvature  and  descent 

We  now  present  the  rules  to  deride  how  to  select  the  lincsearch  model  used  in  each  iteration, 
and  if  the  curvilinear  search  is  to  be  used,  how  to  define  the  values  for  r  and  w.  Once  the 
search  direction  p  has  been  computed,  let 
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a)  r  -  fl,  ic  =  p  if  pl  lip  <  ‘2gTp  <  0, 

b)  c  =  ( 1  -f  ■)  )p,  w  =  -~fp  if  p1  lip  <  0,  gTp  >  0  and  -  p1  lip  >  kgTp, 

c)  use  a  normal  lincsearch  otherwise, 

where  k  is  a  constant  satisfying  0  <  k  <  1,  and  7  is  defined  from 


The  convergence  proofs  make  use  of  several  properties  that  follow  from  the  definitions 
of  r  and  w.  If  we  define 


!'2gTv  +  u'THw  for  cases  a)  and  b), 
2grp  for  case  c), 


then  for  the  different  cases, 

a)  /,,  =  p1  Up  <  gTp  +  \prHp- 

b)  /,,  =  2(7  +  1  )g7p+  7 2prHp  <  gTp~  (72  -  \)pTItp+  7 2PTHp  =  g!p  +  \pTHp, 

O  fP  =  2grp  <  grp  +  \pJHp  if  gTp  <  pTHp , 

f,,  =  2g 1  p  <  2<j  !p  +  pTIIp  if  0  <  pTIlp  <  2gTp, 

fP  =  2grp  <  2grp  +  jZk(kgrp  +  prHp)  -  jzj(‘2gTp  +  p’Hp)  otherwise. 

From  (0.1.2)  and  these  results, 


/p<min(- J,| |/>||2  +  /J2||r||,^(-/31||p||2  +  /?2||r||))  <  -/?, ||p||2  +  4/3a||r||.  (6.2.11) 

A  second  useful  inequality  is 

Ip  <  2 gTp,  (6.2.12) 

following  from  one  of  the  alternative  cases 

a)  /,,  =  prHp  <  '2gTp, 

M  fP  =  2(7  +  1  )g rp  +  l2pTtlp<  (2(7  +  1)  -  k~/2)gTp  =  *(2  -  fc)ffTP  <  2pTp, 
c)  Ip  =  2grp. 

Another  interesting  property  of  the  previous  definition  is  given  in  the  next  lemma. 


Lemma  6.2.1.  There  exists  an  ej  >  0  such  that  ||Pfc 1 1  <  £ d<  then  a  normal  lincsearch  is 
uscel. 


Proof.  Assume  that  the  lemma  does  not  hold.  Then  there  exists  a  sequence  {-c*-},  and 
an  associated  sequence  of  search  directions  {pc},  such  that,  pc  — >  0  and  p c  satisfies  the 
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conditions  I'or  cases  a)  or  h).  Without  loss  of  generality,  assume  that  the  sequence  {r^.}  is 
convergent,  ami  let  the  limit  point  be  x* .  a  second-order  KKT  point  for  problem  NLP,  from 
Lemma  (i.  1 . 1 . 

Define  a  new  sequence  of  vectors  {/'(.-}  from 

Pk 

>/k  M’ 

and  select  a  convergent  subsequence  where  either  case  a)  or  case  b)  holds  for  all  k.  ('Ihe 
index  k  will  also  be  used  to  denote  Die  elements  in  the  subsequence.)  Let  u*  be  the  limit 
point  for  the  subsequence. 

From  the  conditions  for  cases  a)  and  <>). 

! l>UlkPk\  >  k\‘lkPk\  =>  \PkHk‘'k\  >  k\glvk\, 

and  in  the  limit  ;/*  V*  =  0.  Hut  this  implies  A* MV*  =  0.  and  from  strict  complementarity 
i*  €  A  (  l*  j.  We  also  have 

V k  p[.UkPk  <  0  =>  i*1  ll*i*  <  0. 

but  this  contradicts  the  fact  that  we  must  have  a  strong  minimizer.  from  assumption  A6. 
proving  the  result .  | 

This  result  allows  u..  to  define  the  following  constant.  From  Lemmas  (>.1.1  and  d.1.1. 
assumption  A6  and  Lemma  (i.2.1. 

f.,  is  a  positive  constant  such  that  Ijpcll  £  '«  implies  that  pk  has  berm  obtained  as  a  second- 
order  KKT  point,  the  correct  active  set  has  been  identified,  the  smallest  eigenvalue  of 
the  reduced  Hessian  is  at  least  n .  and  a  normal  linesearch  is  used. 

Finally,  note  that  for  cases  a)  and  b),  o'  (0)  <  0. 

Linesearch  termination 

When  we  use  the  curvilinear  search,  it  may  no  longer  be  possible  to  satisfy  the  termination 
conditions  given  for  the  normal  linesearch  in  Chapter  2.  (2. 2d!)  and  (2.2.1);  consequently, 
they  need  to  be  replaced.  Satisfactory  termination  criteria  of  a  similar  type  to  those  given 
in  Chapter  2  are  now  presented.  A  check  is  made  whether  the  condition 

o'  ( 1  )  <  d>'  ( 0 )  +  ±rrd>'  "(0) 


(H.a.if!) 
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no 


ti.{. 


is  satisfied  !>y  tilt-  step  o  —  1.  If  not.  then  a  vaJne  a  £  (0,  1)  satisfying 

n*  .</ 

o(  (")  <  Oi(0)  +  (T—OL  (0) 

oc'(o)  >  p(V'(0)  +  A^"(0)) 


(6.2.14a) 

(6.2.141)) 


for  1  >  //  >  it  >  (1  and  ^  >  er.  is  computed  as  the  step  length.  1  he  existence  of  a  value  o 
satisfying  (6.2.14)  will  he  shown  in  Lemma  6.6.6. 

from  t  tie  definitions  of  r  and  ir,  when  case  b)  applies  the  form  of  the  step  in  the  original 
variable,',  is  given  by  o((  1  4-  ~  jo  -  y  )p.  A  consequence  of  this  expression  is  that  for  a  value 


n  -  - 

1  +  - 

we  gel  ho  i  linage  in  the  x  variables.  Though  this  step  hits  no  effect  on  the  convergence 
proof*  i -dure  we  are  "till  making  finite  changes  in  the  other  variables),  such  a  step  may 
lie  considered  unsatisfactory  from  a  practical  point  of  view.  We  present  an  alternative 
liliesearch  i  i  it  el  it ui  for  t  his  case, 
let 

o  —  - —  . 

2(  I  +  y ) 

lfit».2.ld'  holds,  then  let  o  =  1 ;  ot herwise.  check  condi t ion  ( 6.2. 1  la )  for  o  =  o: 

-  2 

O'  (o)  <  O'  (0)  +  n  —  d  (0).  (6.2.15) 

If  this  condition  is  not  satisfied  either,  compute  a  value  o  £  (()./»)  satisfying  (6.2.1  1). 


G.3.  Definition  and  properties  of  the  penalty  parameter 

lo  guarantee  convergence  of  the  algorithm,  each  step  must  satisfy  a  sufficient  descent  con¬ 
dition.  This  implies  the  need  to  select  the  penalty  parameter  in  such  a  way  that  the  initial 
derivatives  of  the  merit  function  (the  quantities  bounding  the  descent  achieved  in  the  line- 
search  )  take  acceptable  values,  and  in  particular,  property  P4  (suitably  extended)  holds  for 
the  algoriti.  .i.  both  when  the  normal  linesearch  and  when  the  curvilinear  search  are  used. 
I  lie  next  paragraphs  indicate  a  way  in  which  this  ran  be  done  for  both  rases,  and  the  cost 
of  the  section  presents  the  properties  associated  with  this  definition. 
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Definition  of  the  penalty  parameter 

When  trying  to  show  that  property  P4  holds  for  this  algorithm,  we  face  an  immediate 
complication.  1'here  is  no  longer  any  quantity  readily  available  that  provides  a  good  measure 
for  the  bound  ,i;(||/»jt||2  on  the  initial  derivatives  for  the  linesearch.  For  example,  the  values 
used  in  Chapters  -1  and  5,  pTII  p  and  pTzZTIl  Zpz  +  ||d  +  u’c|[2  respectively,  may  not  even  be 
positive.  Consequently,  we  introduce  in  this  section  a  definition  of  pk  based  on  the  value  of 
the  penalty  parameter  that  makes  the  corresponding  derivatives  zero,  with  the  addition  of 
adequate  safeguards. 


Let 


T 


j  / ( A  -  />(e  -  *))  +  TtPic,  -  .s,)u’7V2c,  u>  for  the  curvilinear  search, 


for  the  normal  linesearch; 


and 


Ip  =  Ip  +  2(2A  -  pY(c  -  s). 


From  ( (i.2. !  1 ). 


Ip  <  -^ilHI2  +  ^Ik-  4- 


where  we  inn  assume  that  d'>  >  ,1\. 
Define  p\  from 


p\  =  < 


f  r^tiT  if  T  >  0. 
||c  -  .s]|2 

fp 


otherwise. 


c  -  SI 


I.et  p~  denote  the  value  of  the  penalty  parameter  at  the  previous  iteration.  If  p~  =  0  and 
p\  <  0,  replace'  fp  in  the  previous  definition  by  IP  +  Ai||7-'H2>  where  ,ih  >  0  is  some  specified 
parameter,  and  recompute  the  value  for  p\  accordingly. 

Let 


0 

K 


l|r||2  +  (p  -  fi)Tc  -  (p  +  Zpz)THYpY , 


if  jje||  =  0  or  the  constraint  is  not  active, 
otherwise. 


where  p  denotes  the  QP  multipliers  at  t he  solution  of  (lie  QP  subproblem,  if  available,  or 
the  multipliei  estimate  otherwise. 
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From  tin*  non-singularity  of  the  Jacobian  at  any  limit  point  of  the  sequence  {i^}  (as¬ 
sumption  A3),  there  exists  a  constant  A  >  0  such  that 

■n>>  >  a„A\\Ypy\\  =>  =  A. 

Ikll  ZsvA 

It  follows  that  It  satisfies 


IIMI  <  ||r||  +  |t/i-/7||  +  A'|!//(p+ZPz)|! 


This  implies  the  boundedness  of  ||6||  and  also  from  Lemma  3.4.1  and  condition  C8, 


llpfcll  —  0  =>  ||6jt|j  —  0. 


Define  pi  from 


11^  +  HI 
%  -  HI 


if  ^<pc"(0,p  )  >  -pl'/jHZpz  -  ||c||2 
or  <PN  (0, p~)  >  -p7zZTIl Zpz  -  ||c||2. 


0  otherwise. 


To  define  a  bound  for  the  penalty  parameter,  we  introduce  a  positive  constant  3th-  and 


let 


P  = 


ma x(p,.p2)  iflbll  <  flth  and  |)r  -  ,s||  >  ||p|l2, 
max(pi,0)  otherwise. 


Also,  let 


Pm  — 


Pmin  if  P  -  - 
2  p~  otherwise. 


Finally,  the  bound  p  is  given  by 


P  = 


2p  if  2p  >  pm, 

Pm  ifpm>2p>p_, 
p~  if  p~  >  2 p. 


From  this  definition  it  immediately  follows  that  p  >  2 p,  and  if  p  >  0  then  p  >  pmin- 


Properties  of  the  penalty  parameter 

From  the  previous  definition  we  can  show  that  property  P4  holds  for  the  algorithm. 
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Lemma  6.3.1.  For  pk  >  0  drfineu  as  above ,  then  exists  a  constant  f3„  >  0  such  that  either 


o£"(0 ./?)  <  -/3;/||p*.||2,  or 

<?fc'(0,p)  <  -pH\\Pk\\2, 


(6.3.1) 


for  all  p  >  pk . 


Proof.  Define  a  value  e'  such  that  min(/J(A,fs)  >  ('  >  0,  and  whenever  ||p*||  <  ('  we  have 
(pk  +  bk)1si<  >  0.  Consider  the  following  cases: 
ii 


•  If  lie  -  sll  <  f^lblP.then 


c?c"(Q,p)  =  fp  +  T  -  2 p\\c  -  s||2  <  fp  <  -^i||p||2. 
o'v'(0,/i)  =  \fp  -  pjjc  -  s||2  <  \fp  <  -\lh\\p\\2- 


If  ||  e  -  -"11  >  T-y||p||2  and  ||/;||  >  ('.  then  if  p  >  0,  from  p>  p  i, 


W2 


/p  +  T  -  2/?||c  -  s||2  <  -\p\\c  -  s||. 


implying 


e^y2 

2t3'2) 

''(0 ,P)  <  J*||2  <  -|f>nnn(^-)"""112 


>C"(0./3)  <  -^>mi„||c-  f||2  <  -^P,nin(j^r)  lb||2, 


If  P  =  0, 


<bc"(0,p)  <  -/?ft||p||2, 

'p^'io.p)  <  -yh\\P\\2. 


•  If  lie 


>  ^IbH2  and  ||g||  <  from  ||p|j  <  c5  we  must  have  used  the  normal 
2.1) 


linesearch.  and  from  the  definition  of  p  it  must  hold  that  p  >  max(p  , p?). 

<Pv'(0,/i)  =  -pTI[p  -  fiTc  +  (2A  -  p)T(c  -  s)  -  p\\c  -  s||2 

=  -p[ZrUZPz  -  ||e||2  -  (2£  +  b)T{c  -  s)  -  (,t  +  b)rs  -  p\\c  -  s||2 
<  -2p^T///pz-2||r||2 


(6.3.2) 
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implying  that  property  P4  holds.  | 

Following  the  procedure  outlined  in  Chapter  3  for  the  global  convergence  proof,  the  next 
step  is  to  establish  bounds  for  the  rate  of  growth  of  the  penalty  parameter.  The  next  lemma 
shows  that  property  P5  holds  for  this  algorithm. 

Lemma  6.3.2.  For  any  iteration  ki  in  which  the  value  of  p  is  modified, 

IK  ||2  <  Ar 

and 

PC,  IK  -  Sfc,l|  <  A'. 

for  sonu  constant  .V. 

Proof.  We  show  first  that  for  some  positive  constant  K,  whenever  the  value  of  p  has  to  be 
modified. 

||c-s||>  A'lbll2.  (6-3.3) 

Considering  the  cases  introduced  in  the  last  lemma,  whenever 

the  result  holds  immediately.  If  this  is  not  the  case,  assuming  that  fi2  >  li\  +  tfh  it  follows 
that  p  -  max(/>;,0)  and  from 

fP  <  -ih\\l>\\2  +  P'iWc  -  s\\  <  -[3'2\\c  -  s\\  <  0, 


115 


0.3.  Definition  and  projter'ties  of  the  penally  parameter 


but  fro  in  ||c  -  s||  >  A'HpII2  it  follows  that 

pIIpII2  <  a\ 

completing  the  desired  result.  | 

The  proof  now  proceeds  along  the  same  lines  as  those  given  in  Chapter  3.  If  the  normal 
linesearch  is  used,  for  the  corresponding  iterations  the  results  given  in  Lemmas  3.6.1  to 
3.6.6  hold  as  given  in  Chapter  3.  If  the  curvilinear  search  is  used,  it  is  necessary  to  modify 
the  proofs  for  some  of  these  results,  as  follows. 

Lemma  6.3.3.  At  any  iteration  where  p  has  to  be  modified, 

°r/<  <  a'iIIpII2  +  Ar2lk  -  si 

win  re  p  denotes  the  QF  multipliers,  and  and  Ar2  are  positive  constants. 

Proof.  If  ||/j||  >  es,  the  result  follows  from  assumptions  A2  and  A3.  If  ||p||  <  es.  then  p 
has  been  obtained  as  the  solution  for  the  QP  subproblem,  and  it  satisfies 

(jTp  +  pTIl  p  =  -cTfi. 

Furthermore,  a  normal  linesearch  has  been  performed. 

Let  p"  denote  the  value  of  the  parameter  before  being  modified;  if  p  =  p\,  then 

OA"(0 ,/>“)  >  d>A’'(0,p)  >  -\fp  >  ^i||p||2  -  \&\\c  -  s\\,  (6.3.4) 

and  if  p  =  p2. 

<t>s'(Q,p~)  >  -pTzZTHZpz-  p|!2  >  -/J/z.IIpH2.  (6.3.5) 

From 

<t>N'(0,p~)  =  pTg  +  (2A  -  p)T(c  -  .s)  -  p“||c  -  s||2 
and  the  previous  equations, 

°7'  =  -/ >TUp  -  d>N'(0,p)  +  (2 A  -  p)T(c  -  s)  -  p-\\c  -  s||2 

<  HikWvW2  +  (4  +  ||2A  -  7*11)11°  -  all  -  P~ll°  -  -Ml2- 

From  the  nonnegativity  of  p~\\c  —  ,s||2  and  the  boundedness  of  the  Lagrange  multiplier 
estimate  the  desired  result  follows.  | 

The  proof  of  Lemma  3.6.2  does  not  require  any  modification  for  this  case.  The  proof  of 
Lemma  3.6.3  needs  to  be  slightly  modified,  as  follows. 
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Lemma  6.3.4.  There  exists  a  bounded  constant  M  such  that,  for  all  l, 

ki+ 1-1 

pkl  E  IKp*!I2  <  M-  (6.3.6) 

k—ki 

Proof.  In  the  case  when  a  normal  linesearch  is  used,  the  proof  follows  along  the  same  lines 
as  the  proof  for  Lemma  3.6.3.  For  the  case  when  a  curvilinear  search  is  used,  consider  the 
following  argument. 

The  subscripts  0  and  I\  denote  quantities  associated  with  iterations  ki  and  Ly+1  respec¬ 
tively.  Consider  the  identity 

K- 1 

<t>o  -*K=  -#+!)’  (6.3.7) 

k  =  0 

and  observe  that  the  termination  criterion  for  the  linesearch  (6.2.14)  and  the  fact  that  the 
penalty  parameter  is  not  increased,  imply  that  for  0  <  k  <  I\  -  1, 

<>k  -  (6-3.8) 

where  0  <  a  <  1.  Since  a o  and  (3H  are  positive,  combining  (6.3.7),  (6.3.8)  and  the  result 
of  Lemma  6.3. 1  gives 

\opH  E  a^llpfcH2  <  -  4>ck. 

k= 0 

Rearranging  terms  we  obtain 

K- 1 

\°Ph  E  Kp*I|2  <  (6-3  9) 

k=0 

The  result  then  follows  by  multiplying  (6.3.9)  by  p0  and  using  Lemma  3.6.2.  | 

Lemma  3.6.4  does  not  require  any  modification. 

Lemma  3.6.5  applies  directly  to  the  case  when  a  normal  linesearch  is  performed.  The 
corresponding  version  of  this  result  for  the  case  when  we  use  a  curvilinear  search  is  given 
in  the  following  lemma. 

Lemma  6.3.5.  For  0  <  9  <  a 

<t>k'"(0)  <  -6at0f(O)  -  12ofc^’(0)+  A'M2, 

where  N  is  ,i  constant  independent  of  k. 
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Proof.  The  third  terivative  of  0C  is  given  by 

Oc  (o)  =  F(xa)(2av  +  w)  +  ^,(2avi  +  wt)(2av  +  w)T\/'f  F(xa)(‘2av  +  w) 

rm 

-  O,  (a)  +  P<p2  («), 

v.  here 

4>i  (o )  =  Gtj{yc(xa)(2av  +  w)  -  2oux  -  u2Sj  +  6(2o/i  +  l2)r(Vc(ia)t'  -  2«i) 

+  3 £((2o<],  +  t2,  )(2m>  +  tv)TV2cl(xa)(2av  +  tv)  +  6^,A0ii;TV2c,(x0)(2av  +  tv) 

+  +  u'k)(2av  +  w)TVkc,(xa)(  2av  +  tv) 

and 

<?2  («)  =  6(Vc(ja)(2ai;  -f  w)  -  2att,  -  u2)  ^2Vc(x0 )r  -  2ui) 

+  3J2,  (Vc,-(x0)(2au  +  tv)  -  2auXx  -  u2,)(2at>  +  w)rV2c,(xa)( 2ax  +  w) 

+  Ei(ci(xa)~  -sa,)l2fc(2ow^  +  wk)(2av  +  tu)TV*c,(xa)(2ai’  +  re) 

+  G£,(c.U'a)  -  s0,i'jvTV2ct(xa)(2av  +  w). 

To  compute  a  bound  for  the  third  derivative,  the  following  Taylor  expansions  are  useful: 

Vc,(x0)(2ae  +  u>)-2au|(  -  u2,  -  -2a(c.,  —  st+u,  -  wTV2  Citv  -  (2av  +  tv)TV2Ci(zi)(2av + tv]) , 
c,(xa)  -  .st>1  =  (1  -  a2)(c,  -  Si)  -  a2^a>i  +  jtvTV2c,tv  -  j(2av  +  u>)TV2c,(z')( 2av  +  w)j. 

From  these  results,  the  definitions  of  v  and  w  and  Lemmas  6.3.4  and  3.6.4,  it  follows  that 

0C  (a)  =  24otf(c  -  s)  +  12op||c  -  s||2  +  0(||p||2) 

=  24 at]\c  -  s)  +  6 awTSJ2Fw  +  \2agTv  +  12o(A  -  tx)T{c  -  s)  -  6a<t>c"(0)  +  0(||p||2) 

=  I2a/i1(c  -  s)  +  V2agTv  -  6a<t>c  (0)  +  0(||p||2). 

We  must  now  consider  two  cases.  If  v  ^  0  we  can  write 

0c"\a)  =  \2avT(g  -  ATp)  -  Qa0c"{O)  -  12o/trs  +  0(||p||2),  (6.3.10) 

and  if  w  0  but  v  —  0  then 


>?c'"(n)  =  12 nwT(g  -  Arfi)  -  6a<£c"(0)  -  12od>c'(0)  -  12o/tTs  +  0(||p||2).  (6.3.11) 
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From  condition  C8  on  the  multipliers,  implying  that  for  large  enough  k.  prs  >  0,  the  final 
result  follows: 

Oc'"(o)  <  -6o<t>c"(0)  -  12a0c(O)  +  A'HpII2  (6.3.12) 

for  some  positive  constant  A\  | 

It  is  now  possible  to  prove  that  the  steplength  ak  is  also  bounded  away  from  zero  in 
the  case  when  a  curvilinear  search  is  performed.  For  the  normal  linesearch,  the  equivalent 
result  is  given  in  Lemma  3.6.6. 

Lemma  6.3.6.  If  a  c unilinear  search  is  performed,  the  steplength  e\k  ( 0  <  ok  <■  1)  satisfies 

a'2 

ock(<n)-d>ck(0)<cr^4>f(0) 

and  nk  >  o,  where  0  <  a  <  1,  and  d  >  0  is  independent  of  the  iteration. 

Proof.  YYe  show  that  a  step  satisfying  the  conditions  for  the  curvilinear  search  termination 
criteria  exists  and  is  uniformly  bounded  away  from  zero.  To  take  into  account  the  variant 
in  the  termination  conditions  introduced  for  case  b),  let  d  denote  a  given  initial  value,  to 
be  selected  as  either  1  or  d. 

Assume  that  condition  (6.2.14a)  is  not  satisfied  for  a  =  d;  that  is, 

~2 

d>c(6)  >  <fc(0)  +  cr^-<pc"( 0). 

Define 

2 

tM«)  =  <t>C{o)  -  4>C(  0)  -  r7y0C"(O), 

so  that 

tli(a)  =  <>C'(n)  -  actef"  (to). 
t/"(o)  -  et>c"(n)~aej)c"( 0). 

For  o  =  (), 

<M0)  =  0, 

f4(0)  =  d»c'(0)  <  o, 

*"(0)  =  (1  -  CT)0C"(O)  <  0. 
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2 

ipv(a)  =  0c(q)  -  4>c( 0)  -  rja<pc'( 0)  -  r?yd>c"(0). 

From  <,v(d )  >  0,  there  must  exist  a  value  a i  G  (0,o)  for  which  th^(oi)  >  0.  Otherwise, 
if  fc''(a)  <  0  for  all  q  G  [0,d),  integrating  on  this  interval  we  have 


implying 


~  2 

<t>c(a)  <  <pc(0)  +  T]Q<PC'( 0)  +  J?yd>c"( 0), 


0a(«)  <  T)Q<pC'(0)  +  ~  — -O2^)C''(0)  <  0. 


(6.3.13) 


(6.3.14) 


Let  O]  be  the  smallest  such  point,  implying  that  ^>'(a)  <  0  for  ail  o  G  [0,Q] ).  If  we  integrate 
again  between  0  and  qj. 


2 

d>c(«i )  <  d>c(0)  +  T?tYid>c'(0)  +  f?yd>c''(0), 


(6.3.15) 


V><t(q i)  <  T)a\<t>c'(0)  +  (t]~  cr)~-(f)c  " (0)  <  0, 

so  ai  satisfies  the  termination  conditions. 

For  ci  i  we  have 

^(ai)  -  T]0C'{O)  -  t?oi<^c"(0)  =  0, 
and  using  a  series  expansion  for  <pc 

0c'(«i )  =  4>c\ 0)  +  «id>c"(0)  +  y  (t>c"'(6), 


where  9  G  (0,qi]. 

The  previous  equations  imply 


2 

(1  -  9)4>C'(0)  +  a,(l-  mc"(0)  +  d>C"\e )  =  0, 


(6.3.16) 


(6.3.18) 


(6.3.19) 


and  as  we  know  that  a  positive  root  exists,  we  must  have  4>c'"{9)  >  0.  The  root  is  given  by 


and  the  following  bound  holds: 


(*i  >  max 
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From  property  P4,  4>c' "(0)  <  -&h\\p\\2  a»d 

4>c"\0)  <  -18min(dc:"(0).^c'(0))  +  N\\p\\2 


for  some  ;V  >  0,  giving 

2(1  -  Q)H H  \ 

18/?h  +  Ar  )' 

completing  t he  proof.  | 

We  can  now  present  the  global  convergence  theorem  for  this  algorithm. 


>  max 


2(1  -  ri)0H 
18  /3h  +  N 


(6.3.22) 


Theorem  6.3.1.  The  algorithm  described  in  this  chapter  has  the  properly  that 

lim  lipjtll  =  0.  (6.3.23) 

k—oc 

Proof.  The  proof  is  similar  to  the  one  for  Theorem  4.3.1.  We  include  it  here  for  complete¬ 
ness. 

If  ||pfc||  -  0  for  any  finite  k,  the  algorithm  terminates  and  the  theorem  is  true.  Hence 
we  assume  that  ||p^.||  ^  0  for  any  k. 

When  there  is  no  upper  bound  on  the  penalty  parameter,  the  uniform  lower  bound  on 
«  from  Lemmas  3.6.6  and  6.3.6,  and  the  bounds  on  the  growth  of  the  penalty  parameter 
given  by  Lemmas  3.6.3  and  6.3.4,  imply  that  for  any  6  >  0  we  can  find  an  iteration  index 
K  such  that 

\\Pk\\  <  b  for  k  >  A', 
which  implies  that  ||p*||  —*•  0,  as  required. 

In  the  bounded  case,  we  know  that  there  exists  a  value  p  and  an  iteration  index  K  such 
that  p  =  p  for  all  k  >  K .  We  consider  henceforth  only  such  values  of  k. 

The  proof  is  by  contradiction.  We  assume  that  there  exists  c  >  0  and  an  infinite 
subsequence  {k,}  such  that  |)pjt, ||  >  c  for  all  i.  Consider  only  indices  i  such  that  Art  >  K. 
Every  iteration  after  K  must  yield  a  strict  decrease  in  the  merit  function  because,  using 
Lemmas  3.6.6,  6.3.1  and  6.3.6,  and  the  fact  that  the  penalty  parameter  is  not  modified, 

4>{n)  -  d>(0)  <  -\oa2dH\\p\\2  <  0. 

The  adjustment  of  the  slack  variables  s  in  step  (ii)  of  the  algorithm  can  only  lead  to  a  further 
reduction  in  the  merit  function,  as  L  is  quadratic  in  s  and  the  minimizer  with  respect  to  ,s, 
is  given  by  c,  -  \,/p.  For  iterations  from  the  subsequence  we  have 

d>(z*,+  1 )  -  <t>{xk)  <  <p(xk,  +  i)-  d>{xk)  <  -\aa2l3He2. 
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Therefore,  since  the  merit  function  with  p  =  p  decreases  by  at  least  a  fixed  quantity  at 
every  step  in  the  subsequence,  it  must  be  unbounded  below.  But  this  is  impossible,  from 
assumptions  Al.  A2  and  Lemma  2.1.1.  Therefore,  (6.3.23)  must  hold.  | 

Corollary  6.3.1. 

lim  \\xk  -  /jj  =  0. 

K-—OG 

Proof.  The  result  follows  immediately  from  Theorem  6.3.1  and  Lemma  3.1.1.  | 


Corollary  6.3.2. 

lim  (| A*.  -  A*||  =  0. 

k—oc 


Proof.  The  result  follows  from  Lemma  3.7.1,  given  the  results  in  Lemma  3.6.6  and  Corol¬ 
lary  6.3.1.  | 

6.4.  Rate  of  convergence 

After  global  convergence  has  been  established,  the  next  step  is  to  prove  that  under  certain 
conditions  the  algorithm  has  a  quadratic  rate  of  convergence.  Note  that  in  this  section 
we  can  always  assume  that  Lemma  6.2.1  applies,  as  we  are  only  interested  in  the  limiting 
behavior  of  the  algorithm.  Consequently,  we  need  only  consider  the  case  when  a  normal 
linesearch  is  used. 

Again,  it  is  necessary  to  start  by  presenting  some  results  on  the  growth  rate  of  the 
penalty  parameter.  The  next  lemma  establishes  property  P7  for  the  algorithm. 

Lemma  6.4.1.  If  then  exists  an  infinite  subsequence  {A:/}  of  iterations  in  which  the  penalty 
parameter  is  mexhjird, 

l'm  fhtilKII2  = 

/— ■  OO 

and 

Idn  /A,  IK  -  -sjt, ||  =  0. 

1  — -X 

Proof.  We  drop  the  subscript  by  in  what  follows.  From  the  definition  of  p. 


-  •-'!!  =  l|2(  +  *||, 
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and  from  tlu>  fact  that  ||^|j  -*  0  as  ||pc||  —  0,  it  must  hold  that 

lim  ||2£fc,  +  bkl\\  =  0. 

I— *OC 

Assume  that  ||/;||  <  cs.  From  (6.3.2), 

o"'(0.p2)  <  -0q\\p\\2<  0. 

and  from 

<£‘v(0,  \px )  =  0 

it  must  hold  that  p\  <  2 />_>,  implying  that 

,l'">  Pk,\\ckt  -  s/t,l|  =  0. 

/—•CO 

\Ve  can  now  use  (6.3.3)  to  get 

lim  /h-,IKII2  =  0, 

/— oo 

completing  the  proof.  | 

The  proofs  for  Lemmas  3.8.1,  3.8.2  and  3.8.3  hold  for  this  algorithm. 

Conditions  for  quadratic  convergence 

The  last  requirement  for  the  proof  of  quadratic  convergence  is  to  establish  that  a  unit  step 
is  always  taken  for  points  close  enough  to  the  solution  (property  P8).  The  condition  needed 
to  prove  this  result,  and  to  ensure  that  the  sequence  {xk  -  x*}  converges  quadratically,  is 
a  slightly  modified  version  of  condition  C12  on  the  multipliers: 

C12”.  The  multiplier  estimate  satisfies 

II Pk  -  A* ||  =  0(||xfc  +  pk  -  x*  ||). 

Lemma  6.4.2.  If  condition  C12”  is  satisfied,  there  exists  an  iteration  index  k  such  that 
for  all  indues  /,■  >  k  a  unit  staple  nejtli  is  accepted:  ak  =  1. 

Proof.  Assume  that  ||?;||  is  small  enough  so  that  a  normal  linesearch  has  been  performed, 
(liven  that  condition  Cll  in  Chapter  4  is  trivially  satisfied  for  this  algorithm  (remember 
that  Ilk  =  H  r  ),  from  Lemma  1.1.3  we  have  that 


Ike  +  Pk  -  -r*||  =  o(  |k k  -  ,r*||): 
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using  this  result  in  condition  C12”  wo  obtain 

\\t‘k  -  A*  ||  =  o(  ||xjfc  -**  ||). 

Hence  condition  C12  is  also  satisfied.  We  can  now  use  the  same  argument  presented  in  the 
proof  of  Theorem  1. 1.1  to  conclude  that  the  desired  result  holds  for  this  algorithm.  | 

The  proof  of  quadratic  convergence  is  given  in  the  following  theorem. 

Theorem  6.4.1.  Tht  algorithm  presented  in  this  chapter  converges  guadreitically. 

Proof.  It  is  enough  to  show  that  ||j  -f  p  -  x*|j  =  0(||j  -  J*||2),  as  the  previous  lemma 
showed  that  a  unit  step  is  always  taken  for  large  k.  Assume  k  to  be  large  enough  so  that 
Pk  is  obtained  as  the  solution  of  the  QP  subproblem,  and  the  correct  active  set  has  been 
identified. 

We  drop  the  iteration  index  k  in  all  that  follows.  Consider  first  the  decomposition  of 
x  +  p  -  x*  into  null-space  and  range-space  components: 

x  -  x*  =  Zu  +  v't\ 

For  the  range-space  component,  consider  the  series  expansion  restricted  lo  the  active 
constraints  at  the  point: 

0  =  c*  -  c  +  A(x*  -  x)  +  0(||x  -  x*||2 ). 

From  Ap  —  -c  and  the  previous  decomposition, 

A(x  +  p  -  x ■*)  =  0(\\x  -  x*H2). 

For  the  null-space  component,  consider  the  corresponding  Taylor  series  expansions 
around  x: 


-  (J*  =  g  +  Y2F(x*  -  x)  +  G»(||x  -  J-*||2). 

A* T\*  =  A1 A*  +  E.AtV2c,(x*  -  x)  +  0(||.r  -  x*||2). 

Combining  these  two  results, 

II  (X  -  **)  +  A  rX*  =  g  +  £,(A,  -  A*  )V2c,(  x  -  x*  )  +  0(  ||.r  -  jr*||2). 


Summon 
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6.5. 


ami  from  II p  +  <7  = 

Il(r  +  p  -  x*)  +  At(X*  -  ft)  =  £,(A,-  A*)V2c,(x-x*)  +  0(||x-x*||2). 

Now  using  condition  C12”  on  the  multiplier  estimate, 

Pk  -  A*  =  0(||xfc  +  Pk  ~ 

and  assuming  that  ||/j||  is  small  enough  so  that  a  step  of  one  is  taken  in  all  iterations  and 
therefore  A^  =  /U-i-  the  previous  equation  reduces  to 


H(x  +p-x*)+  AT(A*  -  ft)  =  0(||x  -  x*||2 
Putting  these  results  together, 

(  H  AT  \  (  x  +  p  -  x* 


0(||x-x*||2). 


A  0 


A*  —  ft 


and  using  the  non-singularity  of  the  reduced  Hessian  and  the  Jacobian  of  the  active  con¬ 
st  mints  at  t  he  solution. 


x  -f  p  -  x 
A*  —  ft 


=  o(Hx-x*  m 


implying 

completing  the  proof. 
6.5.  Summary 


lint  =  K  <  00 ’ 

1 1 X  ^  X*  1 1 2 


In  this  chapter  eve  have  introduced  and  analyzed  a  third  algorithm  based  on  the  framework 
algorithm  of  Chapter  2.  Its  distinctive  feature  is  the  use  of  exact  Hessian  matrices  of  the 
objective  and  constraint  functions.  As  before,  the  search  direction  is  obtained  from  an  in¬ 
complete  solution  for  the  QP  subproblem.  Some  conditions  on  the  incomplete  solution  have 
been  presented  that  allow  some  convergence  properties  of  the  algorithm  to  be  established. 
The  results  are: 

•  When  the  search  direction  satisfies  the  conditions  introduced  in  Section  (i.l,  the  mul¬ 
tiplier  estimate  satisfies  conditions  C7-C9,  and  the  Hessian  for  the  QP  subproblem. 
lit;,  is  t lie  exact  Hessian  of  the  Lagrangian  function,  then  the  algorithm  is  glolmlly 
court  rtjc ut. 


Summary 
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•  It  the  multiplier  estimates  //;.  satisfy  the  following  condition 

Cl 2”.  IK  -  A*||  =  0(11**  +  /*-  I*||). 

Then  the  algorithm  ronvnyrs  quadrat  ically. 


Chapter  7 


Numerical  Results 


In  this  chapter  we  present  numerical  results  obtained  from  an  implementation  of  the  al¬ 
gorithm  described  and  analyzed  in  Chapter  -I.  The  implementation  has  been  written  as  a 
modification  of  NPSOL.  with  the  only  difference  being  t lie  use  of  an  incomplete  solution 
for  the  Qi’  subproblem  as  the  search  direction,  and  the  consequences  of  this  change  on  the 
rest  of  the  algorithm.  The  details  of  the  modification  are  given  in  the  following  section. 

The  purpose  of  the  testing  reported  in  this  chapter  is  to  demonstrate  that  the  efficiency 
and  robustness  of  the  modified  algorithm  are  comparable  to  those  of  NPSOL.  Naturally,  we 
can  only  test  the  hypothesis  on  the  domain  of  problems  NPSOL  is  designed  to  solve,  namely 
problems  having  a  moderate  number  of  variables  and  constraints,  although  on  these  prob¬ 
lems  the  opportunities  for  improvement  are  limited,  as  we  discuss  in  later  sections.  What 
this  implementation  really  tests  is  whether  the  introduction  of  flexibility  in  the  determina¬ 
tion  of  the  search  direction  has  a  significant  cost. 

7.1.  Implementation 

In  this  section  we  describe  the  implementation  used  for  the  early-termination  rules  intro¬ 
duced  in  Chapter  2.  The  rest  of  the  algorithm  is  identical  to  NPSOL,  and  a  detailed 
description  of  other  implementation  issues  can  be  found  in  Gill  et  a/.  [GMSWSfia]. 

From  the  Lth  QP  subproblem,  the  search  direction  p ^  is  computed  according  to  the 
following  steps.  (The  subscript  k  corresponding  to  the  iteration  number  is  dropped  from 
now  on.) 
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•  An  initial  feasible  point  pa  is  obtained  following  the  same  procedure  as  NPSOL.  Con¬ 
ditions  ( 2.2.6 )  and  (2.2.7)  have  not  been  implemented,  as  the  feasibility  phase  in 
NPSOL  seems  to  give  results  that  are  adequate  with  respect  to  these  conditions. 

•  The  solution  process  continues  until  the  first  stationary  point  p  is  reached,  and  the 
corresponding  QP  multipliers  p  are  computed.  In  all  that  follows  we  work  with  a 
multiplier  vector  p  that  is  weighted  by  the  norms  of  the  corresponding  constraints, 

p,  =  /i,)]a,|;. 

•  Let  (  u  denote  machine  precision.  If 

Vi  Pi  >  -v/Cw.  (7.1.1) 

then  p  is  taken  as  the  search  direction. 

•  If  ( 7. 1. 1 )  does  not  hold,  we  can  take  a  step  away  from  a  subset  of  the  active  constraints 
while  decreasing  the  value  of  the  QP  objective  function.  To  identify  the  set  of  active 
constraints  to  be  deleted,  define 


and  introduce  a  vector  c,  as 

<T, 


Umin  =  min  pi. 

I 


!|n«||  if  Pi  ^  ^mbPmm , 
0  otherwise. 


For  the  results  presented  in  the  following  sections,  (jm f,  =  10  3. 


•  There  is  also  a  limit  on  the  maximum  number  of  constraints  to  be  deleted.  If  the 
previous  condition  is  satisfied  by  more  than  a  specified  number  of  active  constraints, 
fiml,  only  the  3mi  ones  having  the  smallest  multipliers  are  deleted,  h’or  the  results 
given.  iilui  =  50.  For  most  problems  this  limit  has  no  effect,  since  the  total  number  of 
constraints  is  less  than  50. 


•  The  direction  away  from  the  selected  constraints  is  obtained  as  the  least-norm  solution 
of  the  system 


Ad  =  r,\ 


« 
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that  is,  we  define 

dY  =  (.4Y')_1e;,  dz  =  0, 

to  obtain 

d  =  Ydy. 

•  If  ttc  denotes  the  step  to  the  nearest  inactive  constraint,  and  am  is  defined  as  in 
(2.2.9): 

(g  +  II  p)Td 
Qm  "  d'rHd  ’ 

wo  define  a  as  in  condition  C3: 

a  =  min(ac,am.aM ), 

where  o  „  is  1010  for  this  case. 

•  We  obtain  the  search  direction  p  from  (2.2.11): 

=  (p  +  ad  it  \\p\\  <  f3ilp\\p  +  ad\\, 

1  p  otherwise, 

where  :isip  =  100;  with  this  value  the  step  ad  is  accepted  in  nearly  all  cases. 

•  Finally,  the  multiplier  estimate  used  in  the  linesearch  is  taken  to  be  the  QP  multiplier 
if  p  =  p.  Otherwise,  it  is  taken  to  be  the  least-squares  estimate  \L  obtained  from 

AAtXl  =  Ag. 


7.2.  Test  problems 

The  two  algorithms,  NPSOL  and  its  variant  using  an  incomplete  solution  for  the  QP  sub- 
problem  as  the  search  direction,  have  been  compared  by  solving  a  collection  of  114  problems 
from  the  literature.  Some  features  of  these  test  problems  are  given  in  Table  1,  along  with 
the  ‘‘optimal"  function  values  obtained  in  the  actual  runs. 

The  problems  have  been  obtained  from  the  following  sources: 

•  Problem  1  is  the  example  problem  distributed  with  NPSOL;  its  description  can  be 
found  in  [OMSW86aJ.  Problems  3  and  4  are  slight  reformulations  of  the  same  problem, 
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where  tin-  bounds  - 1  <  f3  <  1  have  been  replaced  by  the  constraint  x\  <  1.  Problem 
1  uses  the  same  starting  point  as  Problem  1.  Problem  3  uses  the  starting  point 

,1  2  n  2  1  1  2  _]  _I» 

'  3’  3'  10’  3’  3’  3’  3’  3’  3  0 

•  Descriptions  for  problems  6  and  12-15  can  be  found  in  [MS82].  The  version  of  problem 
G  considered  is  the  one  corresponding  to  a  value  T  —  10.  Problems  12  and  13  start 
from  point  (d)  for  Wright  No.  1  as  indicated  in  the  reference,  while  problems  14  and 
15  st art  from  [joints  (a)  and  (b)  for  Wright  No.  9,  respectively. 

•  A  description  of  the  SQC AH  E  HOOT  problems  ( 1 7-20)  and  of  EXPO  (9)  can  be  found 
in  Era  ley  [Fra88]. 

•  Problems  21-30  were  obtained  from  Boggs  and  Tolle  [BT84]. 

•  All  problems  having  names  starting  with  “IIS”  are  from  Hock  and  Schittkowski  [HS81]. 

•  Problems  85-95  can  be  found  in  Dembo  [Dem7(jJ. 

All  the  above  problems  have  been  used  in  the  past  to  test  NPSOL.  It  should  be  noted  that 
the  problems  in  this  group  are  small;  the  average  number  of  variables  is  10,  and  the  average 
number  of  constraints  is  6.  Nevertheless,  many  of  these  problems  are  considered  hard  to 
solve.  Moreover,  for  some  of  these  problems  the  assumptions  made  in  Chapter  2  to  establish 
the  convergence  results  fail  to  hold:  for  example,  in  some  cases  the  Jacobian  at  the  solution 
is  singular,  or  no  feasible  points  exist  for  some  QP  subproblems. 

In  addition  to  the  previous  set,  the  algorithms  have  been  tested  on  another  group  of 
problems: 

•  The  structural  optimization  problems  99-111  are  described  in  Ringertz  [Rin88].  The 
letters  ”1"  and  "E”  in  the  problem  name  indicate  if  the  formulation  used  included 
explicitly  the  displacement  variables  (“E”)  or  eliminated  them  in  advance.  Also,  the 
following  number  (10.  25,  30  or  03)  denotes  t he  number  of  bars  in  the  truss  considered. 
Finally,  whenever  a  number  is  included  at  the  end  of  the  name  (000.  040  or  000).  the 
initial  point  has  been  modified  to  be  i }  —  0.  40  or  GO  respectively. 

These  problems  have  been  introduced  because  of  the  atypical  behavior  of  quasi- Newton 
SQP  algorithms  on  them.  For  this  group,  the  ratio  of  QP  to  nonlinear  iterations  is  large 


Test  problems 


ISO 


§ 


when  compared  to  the  size  of  the  problem;  on  the  first  test  set  (problems  1-98)  the  average 
ratio  for  XI'. SOI.  is  2  QP  iterations  per  nonlinear  iteration,  while  on  problems  99-114  the 
average  ratio  is  30. 

The  normal  behavior  of  NPSOL  on  the  first  set  of  test  problems  is  to  require  a  relatively 
large  number  of  QP  iterations  in  the  first  few  nonlinear  iterations.  Typically,  the  number 
of  QP  iterations  declines  exponentially  until  near  the  solution,  when  only  one  iteration  is 
required.  As  a  result,  significant  savings  achieved  by  incomplete  solution  of  QP  subproblems 
in  the  early  iterations  are  masked  by  a  large  number  of  subproblems  requiring  only  a  few 
QP  iterations.  As  an  example,  for  problem  98  the  largest  number  of  QP  iterations  needed 
in  any  nonlinear  iteration  is  reduced  from  57  for  NPSOL  to  15  for  the  algorithm  using  early 
termination.  This  effect  is  much  less  clear  when  we  look  at  total  numbers  of  QP  iterations 
(24  1  for  NPSOL  vs.  170  for  early  termination). 

The  STRl’C  problems  depart  from  this  “standard"  behavior,  in  the  sense  that  the 
number  of  QP  iterations  declines  much  more  gradually.  (Although  only  one  QP  iteration 
is  equirod  in  the  end,  most  nonlinear  iterations  require  more.)  This  offers  the  possibility 
of  observing  the  reductions  that  can  be  achieved  by  using  the  early-termination  criterion, 
with  limited  distortion  from  the  asymptotic  behavior  of  NPSOL. 

Finally,  the  problems  in  this  second  group  are  larger  than  the  ones  presented  above;  the 
average  number  of  variables  is  now  55,  and  the  average  number  of  constraints  is  100.  For 
all  the  reasons  mentioned,  this  set  of  problems  provides  a  better  environment  in  which  to 
test  the  ability  of  the  proposed  early-termination  criterion  to  reduce  the  total  number  of 
QP  iterations. 

Computing  environment 

Version  4.02  of  NPSOL  was  used  in  the  comparisons,  and  all  parameters  used  in  the  code 
were  given  their  default  values  (see  [GMSW86a]).  No  attempt  has  been  made  to  improve 
the  results  by  selecting  a  different  set  of  parameters,  as  the  main  goal  of  the  comparison  is 
to  determine  the  reliability  of  the  changes  introduced  in  NPSOL. 

The  runs  were  performed  as  batch  jobs  on  a  DEC  VAXstation  II  with  5  megabytes  of 
main  memory.  The  operating  system  was  VAX/ VMS  version  4.5,  and  the  compiler  used 
was  VAX  FORTRAN  version  4.6  with  default  options. 
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Table  1 

Problem  Set  Description 


N .  • 

IV.U.-m  name 

Variables 

Linear 

constraints 

Nonlinear 

constraints 

Optimal 

objective 

i 

NPSOl.  SAMPLE  PROBLEM 

9 

4 

14 

~.1349963e+01 

j 

SINGULAR 

2 

0 

2 

.  OOOOOOOe 4- 00 

:t 

HEXAGON 

9 

4 

15 

-.1349963e+01 

i 

HEXAGON  (ALT.  START) 

9 

4 

15 

-,1349963e+01 

5 

LC7 

7 

7 

0 

,9295973e+06 

(i 

ALAN  MANNE’S  PROBLEM 

30 

10 

10 

-,2670099e+01 

7 

ROSKN-SUZUKI 

4 

0 

3 

-,4400000e+02 

8 

QP PROBLEM 

7 

7 

0 

-.1847785e+07 

9 

EXPO 

6 

0 

0 

,1866481e-19 

10 

STEINKE2 

6 

0 

4 

. 4000131c  — 03 

It 

NORWAY 

7 

6 

0 

-.2402344e+02 

12 

MIIW4 

5 

0 

3 

,2787187e+02 

13 

MIIW'J 

5 

0 

3 

-,3618808e+02 

M 

M11W9  INEQUALITY  1 

5 

0 

3 

-.2104078e  +  03 

15 

MIIW'J  INEQUALITY  2 

5 

0 

3 

-.6043539e+04 

1(3 

WOPLANT 

12 

3 

5 

.1555716c  4-02 

17 

SQUARE  ROOT  1 

9 

0 

9 

,2500000c  4-04 

18 

SQUARE  ROOT  2 

9 

0 

9 

.2999795e4-01 

19 

SQUARE  ROOT  3 

9 

0 

9 

,2000000c  4-01 

20 

SQUARE  ROOT  4 

4 

0 

4 

,2500000c  4-04 

21 

BTt 

2 

0 

1 

-  1000000e4-01 

22 

BT2 

3 

0 

1 

,3256820e— 01 

23 

BT3 

5 

3 

0 

,4093023e4-01 

24 

BTl 

3 

1 

1 

—  ,4551055c— 03 

25 

BT5-IIS63 

3 

1 

1 

,9577426c  4-03 

26 

BT6-I1S77 

5 

0 

2 

,2415051e4-00 

27 

BT7 

5 

0 

3 

,3065000e4-03 

28 

BT8 

5 

0 

2 

,1000000c  4-01 

29 

BT9  I1S39 

4 

0 

2 

-lOOOOOOe-t-Ol 

30 

BT10 

2 

0 

2 

—  .1000000e4-01 

31 

BTl  1-I1S79 

5 

0 

3 

.91 71343e  — 01 

32 

BTl  2 

5 

0 

3 

.61881 19e4-01 

33 

BT 1-3 

5 

0 

1 

.OOOOOOOe  4- 00 

31 

POWELL  TRIANGLES 

7 

0 

5 

,233I37Ie4~02 

35 

POWELL  BADLY  SCALED 

2 

0 

1 

,1305195e  — 23 

36 

POWELL  WRIGGLE 

2 

0 

2 

—  .1911618c— 15 

37 

PO  W  KLL-M  A  RAT  OS 

2 

0 

1 

—  .1000000e4-01 

.38 

HS72 

4 

0 

2 

,7266794e4-03 

39 

I1S7.3  (CATTLE  FEED) 

4 

2 

1 

,2989438c  4-02 

40 

IIS  107 

9 

0 

6 

.505501 2e4-04 

41 

MU  l\  AI-POLAK 

6 

0 

2 

,5000000e4-01 

42 

INFEASIBLE  SUBPROBLEM 

2 

I 

1 

— 

43 

HS26 

3 

0 

1 

,1969433e  — 20 

44 

HS32 

3 

1 

1 

.  1000000e-f-01 

45 

HS46 

5 

0 

2 

,1936782e  — 22 

46 

USA  I 

5 

3 

0 

,3851860e-32 

47 

IIS52 

5 

3 

0 

,5326648c  4-01 

48 

I  IS  5.3 

5 

.3 

0 

,4093023c  4-01 

49 

PENALTY  1  A 

50 

1 

0 

.4313635e  — 01 

50 

PENALTY 1  B 

50 

1 

0 

.•1313635c  — 01 

51 

PENALTYl  C 

50 

1 

0 

,4313635e-01 

52 

IIS  13 

2 

0 

1 

1002181c4-01 

53 

IISOl 

3 

0 

1 

.6299842e4-04 

54 

IIS65 

3 

0 

1 

■9535289c  4-00 

55 

I1S70 

4 

0 

1 

,7498464c  — 02 

56 

HS71 

4 

0 

2 

.1701 402e4-02 

57 

HS7I 

4 

2 

3 

.51 26498c  4- 04 
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No. 

Problem  name 

Table  1  (cont.) 
Problem  Set  Description 

Linear 

Variables  constraints 

Nonlinear 

constraints 

Optimal 

objective 

58 

HS75 

4 

2 

3 

,5174413e+04 

59 

HS78 

5 

0 

3 

—  .2919700e+01 

60 

HSSO 

5 

0 

3 

■  5394985e-01 

61 

HSS1 

5 

0 

3 

.5394985e— 01 

62 

HS84 

5 

0 

3 

532902  5e+07 

63 

HS8.5 

5 

0 

38 

—  1905155e+01 

64 

HS86  (COLVILLE  1) 

5 

10 

0 

—  .3234868e+02 

65 

HS87  (COLVILLE  6) 

6 

0 

4 

•8927598e+04 

66 

HS93 

6 

0 

2 

1350760e+03 

67 

HS95 

6 

0 

4 

.1561953C-01 

68 

HS96 

6 

0 

4 

,1561953e— 01 

69 

HS97 

6 

0 

4 

,3135809e+01 

70 

HS98 

6 

0 

4 

,3135809e+01 

7V 

HS99 

7 

0 

2 

—  .82901 02e+09 

72 

HS100 

7 

0 

4 

.680630  le+03 

73 

HS104 

8 

0 

5 

.3951 163e+01 

74 

H  Si  05 

8 

1 

0 

.1 138418e+04 

75 

HS108  (HEXAGON) 

9 

0 

13 

-  ,8660254e+ 00 

76 

HS109 

9 

1 

8 

.5362069e+04 

77 

HS110 

10 

0 

0 

—  .457784  7e-f  02 

78 

HS1 1 1 

10 

0 

3 

-,4773239e+02 

79 

HS112  (CHEMICAL  EQ.) 

10 

3 

0 

-,4776109e+02 

80 

HSl  13 

10 

3 

5 

,2430621e+02 

81 

HS114 

10 

5 

6 

~.1768807e+04 

82 

HSl  17  (COLVILLE  2) 

15 

0 

5 

•3234868e+02 

83 

HS118  (LC  PROBLEM) 

15 

17 

0 

,6648204e+03 

84 

HSl  19  (COLVILLE  7) 

16 

8 

0 

2448997e+03 

85 

DEMBO  IB 

12 

0 

3 

•3168222e+01 

86 

DEMBO  2-HS83 

5 

0 

6 

.1012243e+05 

87 

DEMBO  3 

7 

4 

10 

.1227226e+04 

88 

DEMBO  4A 

8 

0 

4 

.3951163e+01 

89 

DEMBO  4C 

9 

0 

5 

•3952139C+01 

90 

DEMBO  5-HS106 

8 

3 

3 

•7049248e+04 

91 

DEMBO  6-HS116 

13 

3 

10 

■9758751e+02 

92 

DEMBO  7 

16 

8 

11 

-1747870c  4- 03 

93 

DEMBO  8A 

7 

0 

4 

.1809765e+04 

94 

DEMBO  8B 

7 

0 

4 

■9118806e+03 

95 

DEMBO  8C 

7 

0 

4 

5436680e+03 

96 

OPE 

67 

0 

60 

.9927005e+00 

97 

GBD  EQUILIBRIUM  MODEL  44 

38 

6 

■4510281e— 16 

98 

WEAPON  ASSIGNMENT 

100 

12 

0 

-.1 73501 9e+04 

99 

STRUCI10KON 

10 

0 

11 

4156398e+04 

100 

STRUCE10KON 

18 

10 

8 

.4156398e+04 

101 

STRUCI10VAN 

10 

0 

12 

,5076669c+04 

102 

STRUCE10VAN 

18 

10 

8 

,5076669e+04 

103 

STRUCI25006 

8 

0 

74 

•5451627e+03 

104 

STRUCE25006 

44 

50 

36 

■5451627C+03 

105 

STRUCI25DAT 

8 

0 

74 

5451627e+03 

106 

STRUCE25DAT 

44 

50 

36 

•5451627e+03 

107 

STRUCI36DAT 

21 

0 

76 

.338991 5e+05 

108 

STRUCE36DAT 

75 

72 

54 

.338991 5e+05 

109 

STRUCI63040 

63 

0 

128 

.6117064e+04 

110 

STRUCE63040 

147 

126 

84 

■6117064e+04 

111 

STRUCI63060 

63 

0 

128 

,6117064e+04 

112 

STRUCE63060 

147 

126 

84 

•6117064C+04 

113 

STRUCI63DAT 

63 

0 

128 

.6117064e+04 

114 

STRUCE63DAT 

147 

126 

84 

6117064C+04 

1. 3.  lit  milts 
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7.3.  Results 

The  results  obtained  from  running  both  algorithms  on  the  test  set  described  in  the  previous 
section  are  presented  in  Table  4. 

The  parameters  chosen  to  characterize  the  relative  performance  of  both  algorithms  have 
been:  the  number  of  outer  (nonlinear)  iterations  for  each  problem;  the  number  of  calls  to 
the  routine  computing  the  values  of  the  objective  function,  the  constraint  functions  and 
their  derivatives  (function  evaluations);  the  total  number  of  inner  (QP)  iterations  for  the 
problem  (including  the  number  of  iterations  necessary  to  compute  a  feasible  point);  and 
the  running  (CPU)  time  needed  to  solve  the  problem.  The  results  corresponding  to  both 
algorithms  are  given  as  a  single  entry  in  the  tables,  in  the  form 

NPSOL  result/Eariy- termination  result. 

Given  that  many  of  the  problems  are  not  convex,  the  algorithms  may  converge  to  dif¬ 
ferent  solutions.  A  few  such  events  are  indicated  in  Table  4.  Another  possible  outcome  is 
failure — that  is.  the  algorithm  terminates  without  finding  a  solution,  because  the  iteration 
limit  has  been  exceeded,  because  no  significant  progress  can  be  made  at  the  current  point 
with  respect  to  the  merit  function,  or  because  the  objective  or  constraint  functions  need 
to  be  evaluated  at  a  point  for  which  they  are  not  defined  in  the  code.  Such  failures  are 
indicated  by  " — ”. 

To  summarize  the  results  from  the  test  set  we  now  give  statistics  for  the  whole  set  of 
problems.  We  start  by  presenting  in  the  following  table  the  number  of  failures  for  both 
algorithms.  These  values  illustrate  the  reliability  of  the  early-termination  algorithm:  it  is 
able  to  solve  98%  of  the  number  of  problems  solved  by  NPSOL,  and  92%  of  all  the  problems 
attempted. 

Table  2 


Problems  Successfully  Solved 
NPSOL  Early  termination 


Table  3  presents  a  summary  of  the  results  for  the  four  (piantities  monitored  in  Table  4. 


The  values  have  been  computed  as  the  geometric  means  for  the  ratios  of  the  value*  P>r 
NPSOL  and  for  the  early- termination  algorithm;  that  is,  entries  larger  than  one  indicate 
that  the  corresponding  value  for  NPSOL  is  larger  than  the  value  for  the  early-termination 
cotie  (excluding  those  problems  where  one  of  the  algorithms  failed).  Separate  entries  have 
been  provided  for  problems  1-98  (the  smaller  problems),  and  for  problems  99-114  (the 
structural  optimization  problems). 


Table  3 

Average  Behavior;  NPSOL  vs.  Early  Termination 

Problems 


All 

1-98 

99-114 

Nonlinear  iterations 

.988 

.9  79 

1.044 

Function  evaluations  | 

.994 

.999 

.963 

QP  iterations 

1.190 

1.112 

1.884 

CPU  time 

1.043 

1.022 

1.200 

We  now  comment  briefly  on  the  implications  of  these  results. 

•  The  early-termination  rule  seems  to  behave  very  well  regarding  the  numbers  of  non¬ 
linear  iterations  and  function  evaluations;  even  if  we  are  now  using  a  search  direction 
of  “worse  quality”  than  in  NPSOL,  the  numbers  are  very  close  for  both  algorithms. 

•  The  number  of  QP  iterations  is  reduced  by  20%  for  the  complete  set.  When  judging 
this  figure  we  must  take  into  account  that  the  problems  are  small,  implying  that 
the  number  of  QP  iterations  required  per  nonlinear  iteration  is  also  small.  (In  fact, 
the  average  value  for  the  test  set  is  5.6  QP  iterations  per  nonlinear  iteration.)  The 
opportunity  for  improvement  is  correspondingly  limited.  Moreover,  both  codes  use  the 
active  set  at  the  solution  of  the  previous  QP  subproblem  as  a  prediction  for  the  correct 
active  set  in  the  current  subproblem,  resulting  in  a  small  number  of  QP  iterations  close 
to  the  solution.  Finally,  the  early-termination  rule  still  requires  a  feasible  point,  and 
the  feasibility  phase  is  the  same  as  in  NPSOL.  When  this  phase  accounts  for  most 


1.:].  Ih  suits 


m 


of  I  ho  total  number  of  iterations,  as  with  the  STRUC  problems,  the  possibility  of 
improvement  is  further  diminished. 

Nonetheless,  it  should  be  noted  that  for  problems  99  11  1  the  improvement  obtained 
ts  significantly  greater  titan  Ztr/o,  a.x  me  mean  ratio  is  now  l.S.'s,  in  fad,  when  v.- 
iouk  only  at  the  larger  problems,  the  relative  performance  of  the  early-termination 
algorithm  impioves  markedly.  This  offers  the  promise  that  for  even  larger  problems 
the  results  obtained  may  be  substantially  better  than  the  values  shown  above. 

•  The  CPI  time  required  by  the  early-termination  algorithm  is  lower  than  the  time  for 
XPS01  .  but  by  a  factor  that  is  much  smaller  than  for  fhe  n"nthnr  (,f  QP  iterations. 
This  is  <.lm'  not  only  to  the  fact  that  function  evaluations  can  be  expensive  when 
compared  to  the  effort  to  solve  each  QP  subproblem,  but  also  to  some  derails  in 
the  implementation  that  have  been  chosen  to  affect  the  number  of  QP  iterations, 
even  at  the  expense  of  running  time.  For  example,  the  multiplier  estimate  used 
for  the  linesearch  (the  least-squares  multiplier)  is  expensive  to  compute  when  many 
constraints  are  deleted  in  the  last  step,  as  the  factorization  for  the  Jacobian  of  the 
active  constraints  must  be  updated.  There  are  still  options  to  be  explored  that  might 
improve  the  running  times  for  the  modified  algorithm. 

Finally,  Figures  1  and  2  show  plots  of  the  results  included  in  Table  4,  in  an  attempt  to 
make  these  results  more  easily  understandable.  The  vertical  axes  give  the  base  2  logarithms 
of  the  ratios  between  the  corresponding  values  for  NPSOL  and  the  early-termination  (ET) 
algorithm.  A  value  of  1  would  correspond  to  a  case  in  which  NPSOL  requires  twice  the 
number  of  nonlinear  iterations,  or  function  evaluations,  etc.  needed  by  the  early  termination 
algorithm. 
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Table  4 
Numerical  Results 

Nonlinear  Function  QP  CPU 


io. 

Problem  name 

iterations 

evaluations 

iterations 

time  (s) 

1 

NPSOL  SAMPLE  PROBLEM 

12/13 

16/18 

45/34 

3.69/3.61 

2 

SINGULAR 

15/15 

16/16 

4/4 

1.03/1.05 

3 

HEXAGON 

15/16 

21/23 

32/29 

4.41/4.41 

1 

HEXAGON  (ALT.  START) 

ll/ll 

16/14 

35/26 

3.56/3.26 

5 

LC7 

7/9 

9/11 

13/16 

.76/. 95 

6 

ALAN  MANNE'S  PROBLEM 

17/17 

18/18 

40/37 

21.13/21.92 

7 

ROSEN-SUZUKI 

8/8 

11/11 

9/9 

.81/. 81 

8 

QP  PROBLEM 

8/10 

9/11 

23/15 

1.10/1.04 

9 

EXPO 

33/53 

35/57 

38/57 

1.96/3.08 

10 

STEINKE2 

-7* 

/6 

/  '4 

-7-87 

11 

NORWAY 

4/0( 

5/7 

34/13 

1.23/. 65 

12 

MI1W4 

10/10 

18/15 

14/12 

1.31/1.25 

13 

MI1W9 

30/19' 

56/28 

42/24 

3.71/2.31 

14 

MHW9  INEQUALITY  1 

28/23 

38/28 

59/40 

3.41/2.73 

15 

MHW9  INEQUALITY  2 

41/14' 

58/27 

80/24 

4.83/1.77 

16 

WOPLANT 

25/29 

29/33 

44/35 

6.85/7.17 

17 

SQUARE  ROOT  1 

-*/-* 

-/- 

-/- 

-/- 

18 

SQUARE  ROOT  2 

23/23 

36/36 

0/0 

5.01/5.32 

19 

SQUARE  ROOT  3 

6/6 

9/9 

7/7 

.95/. 94 

20 

SQUARE  ROOT  4 

-7-* 

-/- 

-/- 

/  . 

21 

BT 1 

11/11 

19/19 

11/11 

.81/. 83 

?•? 

BT2 

9/9 

14/14 

9/9 

.71/. 70 

2  3 

ET7 

2/2 

5/5 

2/2 

.19/. 19 

2-4 

BT4 

12/12 

18/18 

13/13 

.92/. 92 

25 

RTS  -1IS63 

6/C 

9/9 

8/8 

.58/. 58 

26 

B  I  G-  HS77 

15/15 

21/21 

16/16 

1.52/1.54 

27 

BT7 

31/31 

56/56 

32/32 

3.36/3.43 

28 

BT8 

17/17 

19/19 

17/17 

1.25/1.44 

29 

BT9  I IS39 

13/13 

16/16 

14/14 

OS/ 1  iq 

30 

BT10 

8/8 

11/11 

0/0 

.48/. 52 

31 

BTl  1  IIS79 

9/9 

12/12 

10/10 

1. 05/1.0'' 

32 

B  IT  2 

27/27 

57/57 

28/28 

3.04/3.0-, 

33 

BTl  3 

32/32 

44/44 

34/34 

2.61/2.62 

31 

POWELL  TRIANGLES 

23/15 

37/16 

36/23 

3.27/2.28 

35 

POWELL  BADLY  SCALED 

12/12 

15/15 

13/13 

.85/85 

36 

POWELL  WRIGGLE 

34/32 

69/55 

60/40 

2.77/2.39 

37 

POWELL-MARATOS 

6/6 

7/7 

6/6 

.44/. 44 

■38 

I IS72 

7/7 

8/8 

8/8 

.69/ .67 

39 

1  IS 73  (CATTLE  FEED) 

4/4 

5/5 

4/4 

.38/. 36 

40 

US  107 

11/11 

18/18 

27/18 

2.77/2.56 

41 

MUK  AI-POLAK 

10/10 

16/16 

13/13 

1.08/1.11 

42 

INFEASIBLE  SUBPROBLEM 

-  7-* 

-/- 

— /— 

-/- 

43 

IIS26 

47/47 

64/64 

48/48 

3.39/3.41 

44 

IIS  32 

2/4 

3/5 

3/5 

.25/. 38 

45 

IIS46 

55/55 

58/58 

56/56 

5.26/4.98 

46 

IIS51 

2/2 

5/5 

2/2 

.18/. 14 

47 

HS52 

2/2 

5/5 

2/2 

.19/. 16 

48 

1  IS  53 

2/2 

5/5 

2/2 

.19/. 16 

49 

PENALTY 1  A 

16/16 

18/19 

77/41 

20.01/16.49 

50 

PENALTY 1  B 

6/7 

14/13 

67/32 

14.77/1 1.77 

51 

PENALTY 1 C 

29/15 

85/40 

152/65 

24.35/1 1.65 

52 

1 1 S 1 3 

22/19 

23/20 

13/10 

1.29/1.22 

53 

II  SOI 

29/43 

39/62 

47/60 

2.34/3.33 

54 

I  IS  65 

8/9 

10/11 

16/16 

.70/.  78 

55 

I  IS  70 

36/—* 

39/— 

39/— 

3.33/— 

56 

IIS71 

5/7 

6/9 

9/9 

.53/67 

57 

IIS74 

10/2  6 

15/48 

14/28 

1  17/2.68 

*  Failed  to  solve  the  problem. 

*  Converged  to  a  different  rninimizer. 
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Table  1  (COST.) 
Numerical  results 


No_ 

I’rulili'm  name 

Nonlinear 
iterat  ions 

Function 

evaluations 

QP 

iterations 

CPI 
time  (s) 

r>8 

llS7r» 

0/8 

10/1  1 

7/9 

.72/. 90 

"*9 

1IS7S 

10/10 

14/14 

11/11 

1.15/1.15 

60 

HS80 

8/8 

10/10 

8/8 

.92/. 92 

61 

II  SSI 

1  >1  / 11 

20/20 

15/15 

1.57/1.60 

62 

HS8 1 

-*/< 

-  /* 

-A* 

/  -  5  1 

63 

HSsr, 

17/14 

18/15 

33/20 

4.00/3.12 

64 

MSSI,  (COLVILLE  1) 
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Figure  2.  QP  iterations:  NPSOF  vs.  Karly  termination 

From  Figures  1  and  2  it  can  be  noticed  that  the  results  obtained  present  a  significant 
lack  of  correlation  from  one  problem  to  the  next;  the  comments  offered  earlier  i  i  this  section 
apply  when  the  average  behaviors  are  considered,  rather  than  for  each  individual  problem. 
In  Figure  1.  the  values  for  the  numbers  of  nonlinear  iterations  and  function  evaluations  are 
clearly  clustered  around  zero,  with  relatively  small  deviations  from  the  average.  In  contrast 
to  these  results,  the  predominance  of  positive  values  for  the  number  of  QP  iterations  can 
be  easily  appreciated  in  Figure  2,  especially  for  those  (larger)  problems  beyond  problem  92. 
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7.4.  Further  work 

We  conclude  the  report  with  some  comments  on  tlio.se  areas  where  further  improvement  in 
the  algorithm  is  desirable. 

•  Two  of  the  assumptions  introduced  in  Chapter  2  were  the  nonsingularity  of  the  .Jaco¬ 
bian  for  the  active  constraints  at  the  solution,  and  the  existence  of  a  feasible  legion 
for  all  QP  .subproblems.  Many  of  the  failures  in  the  solution  of  the  test  problems  can 
be  attributed  to  the  corresponding  subproblems  lacking  one  of  these  properties  (or 
being  close  to  violating  them).  NPSOL  includes  rules  to  deal  with  these  difficulties 
but  they  are  not  guaranteed  to  be  able  to  cope  with  all  possible  situations,  particu¬ 
larly  in  the  case  of  infeasible  subproblems.  A  third  related  issue  that  appeared  several 
times  in  the  solution  of  the  problem  set.  was  the  need  for  a  disproportionate  effort  to 
obtain  feasible  points  for  the  QP  subproblems.  In  some  of  the  problems  the  work  to 
obtain  a  feasible  point  was  far  greater  than  the  remaining  work  needed  to  compute  a 
satisfactory  search  direction.  For  example,  in  problem  number  114,  809?  of  the  quite 
considerable  solution  time  was  spent  in  the  feasibility  phase  by  both  algorithms. 

These  last  two  issues  are  closely  related.  It  can  be  expected  that  a  procedure  to 
terminate  the  feasibility  phase  early  may  not  only  yield  further  reductions  in  t ho  total 
number  of  QP  iterations  needed  to  solve  the  problems,  but  at  the  same  time  may 
provide  a  way  to  deal  with  infeasible  QP  subproblems. 

•  Another  open  area,  also  related  to  the  assumptions  made  in  Chapter  2,  is  the  theoret¬ 
ical  study  of  the  relaxation  of  t he  strict  complementarity  requirement.  Some  recent 
work  on  this  topic  by  Burke  [Bur89]  indicates  that  it  might  still  be  possible  to  identify 
a  satisfactory  active  set  at  the  solution  in  a  finite  number  of  iterations.  Several  other 
associated  issues  are  also  open:  for  example,  determination  of  the  best  strategy  to 
compute  a  Lagrange  multiplier  estimate  when  the  Jacobian  is  becoming  progressively 
more  ill-conditioned,  and  study  of  the  theoretical  rate  of  convergence  achievable  by 
tlm  algorithm  when  strict  complementarity  does  not  hold. 

•  Finally,  a  more  general  issue  is  identification  of  the  best  strategy  for  the  solution  of  the 
QP  subproblems  in  the  large-scale  case.  This  report  focused  on  active-set  methods, 
but  recently  there  has  been  great  interest  in  the  use  of  interior-point  methods,  in 
which  the  inequality  constraints  a  re  rewritten  in  the  form  of  equality  constraints  and 
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simple  bounds,  and  a  barrier  function  formulation  is  used  to  move  the  simple  bounds 
into  the  objective  function.  These  methods  may  become  a  promising  alternative  for 
use  within  our  framework  (to  solve  the  QP  subproblems),  as  they  seem  able  to  avoid 
the  exponential  complexity  associated  with  determination  of  the  correct  active  set. 

Kxploratiou  of  these  alternatives  offers  a  great  number  of  possibilities  for  further 
research  in  the  cjuest  for  a  satisfactory  method  to  solve  large-scale  nonlinear  programs. 


References 


[l!ig72] 

[BT84] 

[BTW82] 

[Bur89] 

[BN88] 

[CDT85J 

[CC84] 

[CG84] 


M.C.  Biggs  (1972),  Constrained  minimization  using  recursive  equality  quadra¬ 
tic  programming,  in:  F.A.  Lootsma,  Ed.,  Numerical  Methods  for  Nonlinear 
Optimization  (Academic  Press,  London/New  York). 

P.T.  Boggs  and  J.W.  Tolle  (1984),  A  family  of  descent  functions  for  con¬ 
strained  optimization,  SIAM  Journal  on  Numerical  Analysis  21  1146-1161. 

P.  1 .  Boggs,  J.W.  Tolle  and  P.  Wang  (1982),  On  the  local  convergence  of  quasi- 
Newton  methods  for  constrained  optimization,  SIAM  Journal  on  Control  and 
Optimization  20  161-171. 

J.V.  Burke  (1989),  On  the  identification  of  active  constraints  II:  The  non- 
convex  case,  Preprint  MCS-P43-0189,  Mathematics  and  Computer  Science 
Division,  Argonne  National  Laboratory. 

R.It.  Byrd  and  J.  Nocedal  (1988),  An  analysis  of  reduced  Hessian  methods 
for  constrained  optimization.  Technical  Report  CU-CS-398-88,  Department 
of  Computer  Science,  University  of  Colorado. 

M.R.  Cclis,  J.E.  Dennis,  Jr.  and  R.A.  Tapia  (1985),  A  trust  region  strategy  for 
nonlinear  equality  constrained  optimization,  in:  P.T.  Boggs,  R.H.  Byrd  and 
R.I1.  Schnabel,  Eds.,  Numerical  Optimization  198 J  (SIAM,  Philadelphia). 

1  .1  .  Coleman  and  A.R.  Conn  (1984),  On  the  local  convergence  of  a  quasi- 
Newton  method  for  the  nonlinear  programming  problem,  SIAM  Journal  on 
Numerical  Analysis  21  755-769. 

A.R.  Conn  and  N.I.M.  Gould  (1984),  On  the  location  of  directions  of  infinite 
descent  for  nonlinear  programming  algorithms,  SIAM  Journal  on  Numerical 
Analysis  21  1162-1179. 


142 


4 


[Dem76] 

[DT85] 

[DM77] 

[DS83] 

[Fen87] 

[FMC68] 

[Fle70] 

[FIe85] 

[Fra88] 

[C,il87] 

[GMSW86a] 

[CiMSW86b] 


References 


143 


R.S.  Dembo  ( 1970),  A  sot  of  geometric  programming  test  problems  and  their 
solutions.  Mathematical  Programming  10  192-213. 

R.S.  Dembo  and  U.  Tulowitzki  (1985),  Sequential  truncated  quadratic  pro¬ 
gramming  methods,  in:  P.T.  Boggs,  R.H.  Byrd  and  R.B.  Schnabel,  Eds., 
Numerical  Optimization  1984  (SIAM,  Philadelphia). 

J.E.  Dennis,  Jr.  and  J.J.  More  (1977),  Quasi-Newton  methods,  motivation 
and  theory,  SIAM  Review  19  46-89. 

J.E.  Dennis,  Jr.  and  R.B.  Schnabel  (1983),  Numerical  Methods  for  Uncon¬ 
strained  Optimization  and  Nonlinear  Equations  (Pre,iticc  Hall,  Englewood 
Cliffs,  NJ). 

P.  Fenyes  (1987),  Partitioned  quasi-Newton  methods  for  nonlinear  equality 
constrained  optimization,  Ph.D.  dissertation,  Cornell  University. 

A.V.  Fiacco  and  G.P.  McCormick  (1968),  Nonlinear  Programming:  Sequen¬ 
tial  Unconstrained  Minimization  Techniques  (John  Wiley  and  Sons,  New 
York/ London/Sydney /Toronto). 

R.  Fletcher  ( 1970),  A  class  of  methods  for  nonlinear  programming  with  termi¬ 
nation  and  convergence  properties,  in:  J.  Abadie,  Ed.,  Integer  and  Nonlinear 
Programming  (North  Holland,  Amsterdam). 

R.  Fletcher  (1985),  An  t\  penalty  method  for  nonlinear  constraints,  in:  P.T. 
Boggs,  R.H.  Byrd  and  R.B.  Schnabel,  Eds.,  Numerical  Optimization  1984 
(SIAM,  Philadelphia). 

C.  Fraley  (1988),  Software  performance  on  nonlinear  least-squares  problems, 
SOL  Report  88-17,  Department  of  Operations  Research,  Stanford  University. 
J.C.  Gilbert  (1987),  Maintaining  the  positive  definiteness  of  the  matrices  in 
reduced  Hessian  methods  for  equality  constrained  optimization,  IIASA  Tech¬ 
nical  Report  WP-87-123  (Laxenburg.  Austria). 

P.E.  Gill,  W.  Murray,  M.A.  Saunders  and  M.H.  Wright  (1986),  User’s  guide  for 
NPSOL  (Version  4.0):  a  Fortran  package  for  nonlinear  programming,  Report 
SOL  86-2,  Department  of  Operations  Research,  Stanford  University. 

P.E.  Gill,  W.  Murray,  M.A.  Saunders  and  M.H.  Wright  (1986),  Some  theo¬ 
retical  properties  of  an  augmented  Lagrangian  merit  function,  Report  SOL 
86-OR,  Department  of  Operations  Research,  Stanford  University. 


References 


144 


[GMSW88]  P.K.  Gill,  W.  Murray,  M.A.  Saunders  and  M.II.  Wright  (1988),  Recent  devel- 
opments  in  constrained  optimization,  Journal  of  Computational  and  Applied 
Mathematics  22  257-270. 

[GMW81]  P.K.  Gill,  W.  Murray  and  M.H.  Wright  (1981),  Practical  Optimization  (Aca¬ 
demic  Press,  London/New  York). 

[Go85]  J.  Goodman  (1985),  Newton’s  method  for  constrained  optimization,  Mathe¬ 

matical  Programming  33  162-171. 

[Gur87]  C'.B.  Gurwitz  (1987),  Sequential  quadratic  programming  methods  based  on 

approximating  a  projected  Hessian  matrix,  Technical  Report  219,  Dept,  of 
Computer  Science,  C’ourant  Institute  of  Mathematical  Sciences. 

[Han76]  S.-P.  Han  ( li/<6),  Superlineariy  convergent  variable  metric  algorithms  for  gen¬ 

eral  nonlinear  programming  problems,  Mathematical  Programming  11  263- 
282. 

[Han77]  S.-P.  Han  (1977),  Dual  variable  metric  algorithms  for  constrained  optimiza¬ 

tion.  SIAM  Journal  on  Control  and  Optimization  15  546-565. 

[IIS81]  W.  Hock  and  K.  Schittkowski  (1981),  Test  examples  for  nonlinear  program¬ 

ming,  Lecture  Notes  in  Economics  and  Mathematical  Systems,  Vol.  187 
( Springer- Verlag,  Berlin/Heidelberg/New  York). 

[McC77]  G.  McCormick  (1977),  A  modification  of  Armijo’s  step-size  rule  for  negative 
curvature,  Mathematical  Programming  13  111-115. 

[MS84]  J..J.  More  and  D.C.  Soiensen  (1984),  Newton’s  method,  in:  G.H.  Golub,  Ed., 

Studies  in  Numerical  Analysis  (Mathematical  Association  of  America)  29-82. 

[MuG9]  W.  Murray  (1969),  An  algorithm  for  constrained  minimization,  in:  R. 

Fletcher,  Ed.,  Optimization  (Academic  Press,  London/New  York). 

[MW78]  W.  Murray  and  M.H.  Wright  (1978),  Projected  Lagrangian  methods  based  on 
the  trajectories  of  penalty  and  barrier  functions,  SOL  Report  78-23,  Depart¬ 
ment  of  Operations  Research,  Stanford  University. 

[MS82]  B.A.  Murtagh  and  M.A.  Saunders  (1982),  A  projected  Lagrangian  algorithm 

and  its  implementation  for  sparse  nonlinear  constraints,  Mathematical  Pro¬ 
gramming  Study  16  84-117. 


4 


« 


References 


145 


[N0S5]  J.  Nocedal  and  M.L.  Overton  (1985),  Projected  Hessian  updating  algorithms 

for  nonli nearly  constrained  optimization,  SIAM  Journal  on  Numerical  Anal¬ 
ysis  22  821  850. 

[Po78]  M.J.D.  Powell  (1978),  A  fast  algorithm  for  nonlinearlv  constrained  calcula¬ 

tions,  in:  O.L.  Mangasarian,  R.R.  Meyer  and  S.M.  Robinson,  Eds.,  Nonlinear 
Prognimming  3  (Academic  Press,  New  York). 

[Po83]  M.J.D.  Powell  (1983),  Variable  metric  methods  for  constrained  optimization, 

in:  A.  Baclien,  M.  Grotschel  and  B.  Korte,  Eds.,  Mathematical  Programming: 
The  State  of  the  Art  (Springer,  Berlin/Heidelberg/New  York/Tokyo). 

[PYS6]  M.J.D.  Powell  and  Y.  Yuan  (1986),  A  recursive  quadratic  programming  al¬ 

gorithm  that  uses  differentiable  exact  penalty  functions,  Mathematical  Pro¬ 
gramming  35  265-278. 

[Rin88]  U.T.  Ringertz  (1988),  A  mathematical  programming  approach  to  structural 
optimization.  Report  No.  88-24,  Dept,  of  Aeronautical  Structures  and  Mate¬ 
rials,  The  Royal  Institute  of  Technology,  Stockholm. 

[Rob74]  S.M.  Robinson  (1974),  Perturbed  Kuhn-Tucker  points  and  rates  of  conver¬ 
gence  for  a  class  of  nonlinear  programming  algorithms,  Mathematical  Pro¬ 
gramming  7  1-16. 

[Sch81]  K.  Schittkowski  (1981),  The  nonlinear  programming  method  of  Wilson,  Han 
and  Powell  with  an  augmented  Lagrangian  line  search  function,  Numerische 
Mathematik  38  83-1 14. 

[StoS5]  J.  Stoer  (1985),  Principles  of  sequential  quadratic  programming  methods 
for  solving  nonlinear  programs,  in:  K.  Schittkowski,  Ed.,  Computational 
Mathematical  Programming.  NATO  ASI  Series,  Vol  FI 5  (Springer- Verlag. 
Berlin/Heidelberg)  165-207. 

[dap77]  R.A.  Tapia  (1977),  Diagonalized  multiplier  methods  and  quasi-Newton  meth¬ 
ods  for  constrained  optimization,  Journal  on  Optimization  Theory  and  Appli¬ 
cations  22  135-194. 

[Wil6.3]  R.B.  Wilson  (1963),  A  simplicial  algorithm  for  concave  programming,  Ph.D. 
Thesis,  Harvard  University. 

[W ri 7 G]  M.ll .  Wright  ( 1976),  Numerical  methods  for  nonlinearly  constrained  optimiza¬ 

tion,  Ph.D.  Thesis,  Stanford  University. 


4 


UNCLASSIFIED 


SECURITY  CLAUDICATION  OP  THIS  PACK  (*hmm 


REPORT  DOCUMENTATION  PACE 

READ  WSTRUCnOHS 

BEFORE  COMPLETING  FORM 

TT'KIBoSYTumSTr -  "  “  *•  OOVT  ACCESSION  NO. 

SOL  89-7 

S.  RECIPIENT'S  CATALOO  NUMBER 

4.  TITLE  (m»4  SmSUUm) 

Sequential  Quadratic  Programming 

Algorithms  for  Optimization 

S.  TYPE  OP  REPORT  B  PERMO  COVEREO 

Technical  Report 

4.  PSRPOMMNO  ORO.  REPORT  NUMBER 

T.  author**! 

Francisco  J.  Prieto 

S.  PCNPORMINO  OROANIZATION  NAME  AMO  ADDRESS 

Department  of  Operations  Research  -  SOL 

Stanford  University 

Stanford,  CA  94305-4022 

W.  PROORAM  ELEMENT.  PROJECT.  TASK 

AREA  4  WORK  UNIT  NUMBERS 

11 11 MA 

Office  of  Naval  Research  -  Dept,  of  the  Navy 

800  N.  Quincy  Street 

Arlington,  VA  22217 

IS.  REPORT  OATS 

August  1989 

145  Daaes 

IS.  SECURITY  CLASS,  (mi  mtrn  N*Mj 

UNCLASSIFIED 

1*.  DISTRIBUTION  STATEMENT  (ml  Si  *NPl) 


Thfs  document  has  been  approved  for  public  release  and  sale; 
Its  distribution  Is  unlimited. 


it.  distribution  statement  (mi  mm  mnupMUIM  M.  U  rnUmtmi  ta  I*<K) 


It.  supplementary  notes 


IS.  KEY  BOROS  (Cmmtrnmm  mm  rmrmrtm  milt  If . .  «N  ISmmllfy  tp  SSppS  «bM) 

sequential  quadratic  programming;  large-scale  optimization; 
second-derivative  methods 


JO.  ABSTRACT  (CmmUmmm  m  rmmmtmm  MB*  If  T  aaB  imrnHtr  tp  MmI  ■■>■») 

Please  see  reverse  side... 


00  i  jJJn  1473  coition  or  i  nov  ss  is  obsolete 

SECURITY  CLASSIPICATION  OP  THIS  P  ABB  *BA*»  D***  E«l*r«0 


MCVimTV  CLASSIFICATION  OF  THIS  PAOSfFW  Dm  bNnC 


SOL  89-7:  Sequential  Quadratic  Programming  Algorithms  for  Optimization,  Francisco  J. 
Prieto  (August  1989,  145  pp.). 

The  problem  considered  in  this  dissertation  is  that  of  finding  local  minimizers  for  a  function  subject 
to  general  nonlinear  inequality  constraints,  when  first  and  perhaps  second  derivatives  are  available 
The  methods  studied  belong  to  the  class  of  sequential  quadratic  programming  (SQP)  algorithms  In 
particular,  the  methods  are  based  on  the  SQP  algorithm  embodied  in  the  code  NPSOL,  which  was 
developed  at  the  Systems  Optimization  Laboratory,  Stanford  University. 

The  goal  of  the  dissertation  is  to  develop  SQP  algorithms  that  allow  some  flexibility  in  their  design 
Specifically,  we  are  interested  in  introducing  modifications  that  enable  the  algorithms  to  solve  large-scale 
problems  efficiently.  The  following  issues  are  considered  in  detail: 

Instead  of  trying  to  obtain  the  search  direction  as  a  rmntrmzer  for  the  QP,  the  solution  process 
is  terminated  after  a  limited  number  of  iterations  Suitable  termination  criteria  are  defined  that 
ensure  convergence  for  an  algorithm  that  uses  a  quasi-Newton  approximation  for  the  full  Hessian 
Theorems  concerning  the  rate  of  convergence  are  also  given. 

For  many  problems  the  reduced  Hessian  is  considerably  smaller  than  the  full  Hessian  Conse¬ 
quently,  there  are  considerable  practical  benefits  to  be  gained  by  only  requiring  an  approximation 
to  the  reduced  Hessian  Theorems  are  proved  concerning  the  convergence  and  rate  of  conver¬ 
gence  for  an  algorithm  that  uses  a  quasi-Newton  approximation  for  the  reduced  Hessian  when  early 
termination  of  the  QP  subproblem  is  enforced. 

The  use  of  second  derivatives,  while  having  significant  practical  advantages,  introduces  new 
difficulties;  for  example,  the  QP  subproblems  may  be  non-convex,  and  even  a  minimi zer  for  the 
subproblem  is  no  longer  guaranteed  to  yield  a  suitable  search  direction.  It  is  shown  how  to  construct 
suitable  search  directions  from  approximate  solutions  to  the  QP  subproblem  Also,  theorems  are 
proved  for  the  convergence  and  rate  of  convergence  of  these  algorithms. 

Finally,  some  numerical  results,  obtained  from  a  modification  of  the  code  NPSOL.  are  presented 
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